Unipept, a pioneering software tool in metaproteomics, has significantly advanced the analysis of complex ecosystems by facilitating both taxonomic and functional insights from environmental samples. From the onset, Unipept's capabilities focused on tryptic peptides, utilizing the predictability and consistency of trypsin digestion to efficiently construct a protein reference database. However, the evolving landscape of proteomics and emerging fields like immunopeptidomics necessitate a more versatile approach that extends beyond the analysis of tryptic peptides.
View Article and Find Full Text PDFRecent improvements in methods and instruments used in mass spectrometry have greatly enhanced the detection of protein post-translational modifications (PTMs). On the computational side, the adoption of open modification search strategies now allows for the identification of a wide variety of PTMs, potentially revealing hundreds to thousands of distinct modifications in biological samples. While the observable part of the proteome is continuously growing, the visualization and interpretation of this vast amount of data in a comprehensive fashion is not yet possible.
View Article and Find Full Text PDFMetaproteomics has become a crucial omics technology for studying microbiomes. In this area, the Unipept ecosystem, accessible at https://unipept.ugent.
View Article and Find Full Text PDFSingle-cell proteomics can offer valuable insights into dynamic cellular interactions, but identifying proteins at this level is challenging due to their low abundance. In this chapter, we present a state-of-the-art bioinformatics pipeline for single-cell proteomics that combines the search engine Sage (via SearchGUI), identification rescoring with MSRescore, quantification through FlashLFQ, and differential expression analysis using MSqRob2. MSRescore leverages LC-MS/MS behavior predictors, such as MSPIP and DeepLC, to recalibrate scores with Percolator or mokapot.
View Article and Find Full Text PDFIn protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space.
View Article and Find Full Text PDFThe use of collision cross section (CCS) values derived from ion mobility studies is proving to be an increasingly important tool in the characterization and identification of molecules detected in complex mixtures. Here, a novel machine learning (ML) based method for predicting CCS integrating both molecular modeling (MM) and ML methodologies has been devised and shown to be able to accurately predict CCS values for singly charged small molecular weight molecules from a broad range of chemical classes. The model performed favorably compared to existing models, improving compound identifications for isobaric analytes in terms of ranking and assigning identification probability values to the annotation.
View Article and Find Full Text PDFRescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MSRescore 3.
View Article and Find Full Text PDFHuman leukocyte antigen (HLA) class I peptide ligands (HLAIps) are key targets for developing vaccines and immunotherapies against infectious pathogens or cancer cells. Identifying HLAIps is challenging due to their high diversity, low abundance, and patient individuality. Here, we develop a highly sensitive method for identifying HLAIps using liquid chromatography-ion mobility-tandem mass spectrometry (LC-IMS-MS/MS).
View Article and Find Full Text PDFMotivation: Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics.
View Article and Find Full Text PDFIn the era of open-modification search engines, more posttranslational modifications than ever can be detected by LC-MS/MS-based proteomics. This development can switch proteomics research into a higher gear, as PTMs are key in many cellular pathways important in cell proliferation, migration, metastasis, and aging. However, despite these advances in modification identification, statistical methods for PTM-level quantification and differential analysis have yet to catch up.
View Article and Find Full Text PDFBackground: It is increasingly recognized that conventional food production systems are not able to meet the globally increasing protein needs, resulting in overexploitation and depletion of resources, and environmental degradation. In this context, microbial biomass has emerged as a promising sustainable protein alternative. Nevertheless, often no consideration is given on the fact that the cultivation conditions affect the composition of microbial cells, and hence their quality and nutritional value.
View Article and Find Full Text PDFUnipept Desktop 2.0 is the most recent iteration of the Unipept Desktop tool that adds support for the analysis of metaproteogenomics datasets. Unipept Desktop now supports the automatic construction of targeted protein reference databases that only contain proteins (originating from the UniProtKB resource) associated with a predetermined list of taxa.
View Article and Find Full Text PDFInterest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides.
View Article and Find Full Text PDFMotivation: Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology-considering, e.
View Article and Find Full Text PDFUsing data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types.
View Article and Find Full Text PDFIn recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs.
View Article and Find Full Text PDFReliable peptide identification is key in mass spectrometry (MS) based proteomics. To this end, the target decoy approach (TDA) has become the cornerstone for extracting a set of reliable peptide-to-spectrum matches (PSMs) that will be used in downstream analysis. Indeed, TDA is now the default method to estimate the false discovery rate (FDR) for a given set of PSMs, and users typically view it as a universal solution for assessing the FDR in the peptide identification step.
View Article and Find Full Text PDFA plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines.
View Article and Find Full Text PDFThe pandemic readiness toolbox needs to be extended, targeting different biomolecules, using orthogonal experimental set-ups. Here, we build on our Cov-MS effort using LC-MS, adding SISCAPA technology to enrich proteotypic peptides of the SARS-CoV-2 nucleocapsid (N) protein from trypsin-digested patient samples. The CovMS assay is compatible with most matrices including nasopharyngeal swabs, saliva, and plasma and has increased sensitivity into the attomole range, a 1000-fold improvement compared to direct detection in a matrix.
View Article and Find Full Text PDFThe holistic nature of omics studies makes them ideally suited to generate hypotheses on health and disease. Sequencing-based genomics and mass spectrometry (MS)-based proteomics are linked through epigenetic regulation mechanisms. However, epigenomics is currently mainly focused on DNA methylation status using sequencing technologies, while studying histone posttranslational modifications (hPTMs) using MS is lagging, partly because reuse of raw data is impractical.
View Article and Find Full Text PDF