We present a new preprocessing method, PeakSelect, to improve the accuracy and efficiency of Tandem Mass-Spec peptide (protein) identification. The fundamental difference between noise and fragment ions in spectra is that ions have isotopes but noise does not. We propose a new and important concept of an Isotope Pattern Vector (IPV) which characterizes the isotope cluster of fragment ions. Then the noise and real peaks can be distinguished by the quantitative IPV values. PeakSelect first uses a new method of the Gaussian Mixture Model and Expectation-Maximization (EM) algorithm to find the base intensity level (baseline) in a spectrum. Then PeakSelect selects features based on the IPV and baseline, and constructs a decision tree to automatically classify the peaks into different categories such as noise, single ion peaks, and overlapping peaks. Experiments show that PeakSelect can help to reduce the Mascot searching time and increase the reliability of peptide identifications. In particular, PeakSelect performs well on complex spectra with a large number of peaks from large peptides, and supports more sequence identification than other well-known systems.

Download full-text PDF

Source
http://dx.doi.org/10.1002/rcm.3488DOI Listing

Publication Analysis

Top Keywords

fragment ions
8
peakselect
6
peaks
5
peakselect preprocessing
4
preprocessing tandem
4
tandem mass
4
mass spectra
4
spectra better
4
better peptide
4
peptide identification
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!