AI Article Synopsis

  • Protein sequence databases, particularly UniProtKB, are essential for mass-spectrometry-based proteomics by providing in-depth information and supporting spectrum identification search engines for peptide identification.
  • The MaCPepDB, a new database, contains all the tryptic peptides derived from the Swiss-Prot and TrEMBL sections of UniProtKB, allowing queries for peptide sequences, unique protein associations, and specific peptide masses, aiding in targeted proteomics.
  • With over 5.9 billion peptides and quick query response times, MaCPepDB also accommodates posttranslational modifications in searches, enhancing its utility for researchers in their analyses.

Article Abstract

Protein sequence databases play a crucial role in the majority of the currently applied mass-spectrometry-based proteomics workflows. Here UniProtKB serves as one of the major sources, as it combines the information of several smaller databases and enriches the entries with additional biological information. For the identification of peptides in a sample by tandem mass spectra, as generated by data-dependent acquisition, protein sequence databases provide the basis for most spectrum identification search engines. In addition, for targeted proteomics approaches like selected reaction monitoring (SRM) and parallel reaction monitoring (PRM), knowledge of the peptide sequences, their masses, and whether they are unique for a protein is essential. Because most bottom-up proteomics approaches use trypsin to cleave the proteins in a sample, the tryptic peptides contained in a protein database are of great interest. We present a database, called MaCPepDB (mass-centric peptide database), that consists of the complete tryptic digest of the Swiss-Prot and TrEMBL parts of UniProtKB. This database is especially designed to not only allow queries of peptide sequences and return the respective information about connected proteins and thus whether a peptide is unique but also allow queries of specific masses of peptides or precursors of MS/MS spectra. Furthermore, posttranslational modifications can be considered in a query as well as different mass deviations for posttranslational modifications. Hence the database can be used by a sequence query not only to, for example, check in which proteins of the UniProt database a tryptic peptide can be found but also to find possibly interfering peptides in PRM/SRM experiments using the mass query. The complete database contains currently 5 939 244 990 peptides from 185 561 610 proteins (UniProt version 2020_03), for which a single query usually takes less than 1 s. For easy exploration of the data, a web interface was developed. A REST application programming interface (API) for programmatic and workflow access is also available at https://macpepdb.mpc.rub.de.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jproteome.0c00967DOI Listing

Publication Analysis

Top Keywords

tryptic peptides
8
protein sequence
8
sequence databases
8
proteomics approaches
8
reaction monitoring
8
peptide sequences
8
allow queries
8
posttranslational modifications
8
proteins uniprot
8
database
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!