Poster #43 - Serin Jo
- vitod24
- Oct 20
- 2 min read
Design of a Lightweight Algorithm for Robust Mass Spectrometry Peak Detection
Jo, Serin, Roslyn High School, Roslyn Heights, NY, USA Li, Guangyuan, Ph.D., Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
Mass spectrometry-based proteomics remains the predominant approach for identifying and quantifying proteins and peptides across a wide range of applications. The preprocessing of raw MS data is an important step for accurate downstream analysis. One challenge in preprocessing is accurate peak detection, also known as centroiding, converting raw noisy profile data into centroided peaks, thus facilitating the following peptide spectrum matching steps. Unfortunately, current centroiding is still largely dependent on proprietary "black box" software packages that are costly, closed source, and that limit reproducibility between workflows. To systematically identify the best approach, I coded six novel centroiding approaches, designed by combining different parts of simpler solutions. Using immunopeptidome data obtained from an Orbitrap mass spectrometer, I implemented signal to noise (SNR) thresholding, area under the curve (AUC) thresholding using Simpson's rule, local maxima detection, Gaussian fitting (correlation and convolution), and Mexican hat fitting. Each method was benchmarked against a ground truth-centroiding performed by the proprietary software Bruker and the original peptide sequence of samples. The results showed that traditional thresholding methods (SNR, AUC) achieved moderate performance (F1 = 0.60), while local maximum detection performed slightly better (F1 = 0.70). Gaussian convolution improved noise handling (F1 = 0.65), though Gaussian correlation struggled with peak asymmetry (F1 = 0.30). The method that performed the best was Mexican hat convolution (F1 = 0.74), which outperformed all other strategies by combining noise suppression with accurate peak localization. These findings suggest that wavelet-based convolution offers a lightweight and modular solution for peak detection, especially in complex datasets where overlapping peaks and baseline drift are present. Together, my study developed an open-source and light-weight peak picking algorithm that has comparable performance with proprietary solutions. I anticipate the wide integration of this lightweight implementation into various downstream MS workflows that can simplify tedious preprocessing steps.


Comments