Merging Full-Spectrum and Fragment Ion Intensity Predictions from Deep Learning for High-Quality Spectral Libraries.

J Proteome Res

Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong 999077, China.

Published: December 2023

Spectral libraries are useful resources in proteomic data analysis. Recent advances in deep learning allow tandem mass spectra of peptides to be predicted from their amino acid sequences. This enables predicted spectral libraries to be compiled, and searching against such libraries has been shown to improve the sensitivity in peptide identification over conventional sequence database searching. However, current prediction models lack support for longer peptides, and thus far, predicted library searching has only been demonstrated for backbone ion-only spectrum prediction methods. Here, we propose a deep learning-based full-spectrum prediction method to generate predicted spectral libraries for peptide identification. We demonstrated the superiority of using full-spectrum libraries over backbone ion-only prediction approaches in spectral library searching. Furthermore, merging spectra from different prediction models, as a form of ensemble learning, can produce improved spectral libraries, in terms of identification sensitivity. We also show that a hybrid library combining predicted and experimental spectra can lead to 20% more confident identifications over experimental library searching or sequence database searching.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.3c00180DOI Listing

Publication Analysis

Top Keywords

spectral libraries
20
library searching
12
deep learning
8
peptides predicted
8
predicted spectral
8
peptide identification
8
sequence database
8
database searching
8
prediction models
8
backbone ion-only
8

Similar Publications

Polycyclic aromatic hydrocarbons (PAHs) are toxic contaminants with a widespread presence in diverse environmental contexts. Transformation processes of PAHs via degradation and biotransformation have parallels in humans, animals, plants, fungi, and bacteria. Mapping the transformation products of PAHs is therefore crucial for assessing their toxicological impact and developing effective monitoring strategies.

View Article and Find Full Text PDF

Developing Chemical Signatures for Categories of Household Consumer Products Using Suspect Screening Analysis.

Environ Sci Technol

January 2025

Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States.

Consumer products are a major source of chemicals that may pose a health risk. It is important to understand what chemicals are in these products to evaluate risk and assess new products for uncommon ingredients. Suspect screening analysis (SSA) using two-dimensional gas chromatography-high-resolution-time-of-flight/mass spectrometry (GCxGC-HR-TOF/MS) was applied to 92 consumer products from 5 categories.

View Article and Find Full Text PDF

PAH-Finder: A Pattern Recognition Workflow for Identification of PAHs and Their Derivatives.

Anal Chem

January 2025

Particle Pollution and Prevention (LAP3), Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China.

Polycyclic aromatic hydrocarbons (PAHs) are pervasive environmental pollutants with significant health risks due to their carcinogenic, mutagenic, and teratogenic properties. Traditional methods for PAH identification, primarily relying on gas chromatography-mass spectrometry (GC-MS), utilize spectral library searches together with other techniques, such as mass defect analysis. However, these methods are limited by incomplete spectral libraries and a high false positive rate.

View Article and Find Full Text PDF

NIST Mass Spectral Libraries in the Context of the Circular Economy of Plastics.

J Am Soc Mass Spectrom

January 2025

Mass Spectrometry Data Center, Biomolecular Measurement Division, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, 20899, United States.

Article Synopsis
  • The Mass Spectrometry Data Center (MSDC) is enhancing libraries for identifying plastics-related compounds (PRC) and materials (PRM) as part of NIST's circular economy initiative.
  • To increase the diversity of compounds analyzed, MSDC is utilizing three ionization methods: EI, ESI, and APCI, along with pyrolysis-gas chromatography (py-GC-MS) for solid materials.
  • Collaborating with agencies like the FDA and EPA, they are testing these libraries to address health risks and environmental issues concerning plastics.
View Article and Find Full Text PDF

diaTracer enables spectrum-centric analysis of diaPASEF proteomics data.

Nat Commun

January 2025

Gilbert S. Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.

Data-independent acquisition has become a widely used strategy for peptide and protein quantification in liquid chromatography-tandem mass spectrometry-based proteomics studies. The integration of ion mobility separation into data-independent acquisition analysis, such as the diaPASEF technology available on Bruker's timsTOF platform, further improves the quantification accuracy and protein depth achievable using data-independent acquisition. We introduce diaTracer, a spectrum-centric computational tool optimized for diaPASEF data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!