Bidirectional de novo peptide sequencing using a transformer model.

PLoS Comput Biol

Center for Biomedical Computing, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea.

Published: February 2024

In proteomics, a crucial aspect is to identify peptide sequences. De novo sequencing methods have been widely employed to identify peptide sequences, and numerous tools have been proposed over the past two decades. Recently, deep learning approaches have been introduced for de novo sequencing. Previous methods focused on encoding tandem mass spectra and predicting peptide sequences from the first amino acid onwards. However, when predicting peptides using tandem mass spectra, the peptide sequence can be predicted not only from the first amino acid but also from the last amino acid due to the coexistence of b-ion (or a- or c-ion) and y-ion (or x- or z-ion) fragments in the tandem mass spectra. Therefore, it is essential to predict peptide sequences bidirectionally. Our approach, called NovoB, utilizes a Transformer model to predict peptide sequences bidirectionally, starting with both the first and last amino acids. In comparison to Casanovo, our method achieved an improvement of the average peptide-level accuracy rate of approximately 9.8% across all species.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10901305PMC
http://dx.doi.org/10.1371/journal.pcbi.1011892DOI Listing

Publication Analysis

Top Keywords

peptide sequences
20
tandem mass
12
mass spectra
12
amino acid
12
transformer model
8
identify peptide
8
novo sequencing
8
predict peptide
8
sequences bidirectionally
8
peptide
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!