Proteogenomics is a growing "multi-omics" research area that combines mass spectrometry-based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations.
View Article and Find Full Text PDFUnlabelled: Proteomics has emerged as a powerful tool for studying cancer biology, developing diagnostics, and therapies. With the continuous improvement and widespread availability of high-throughput proteomic technologies, the generation of large-scale proteomic data has become more common in cancer research, and there is a growing need for resources that support the sharing and integration of multi-omics datasets. Such datasets require extensive metadata including clinical, biospecimen, and experimental and workflow annotations that are crucial for data interpretation and reanalysis.
View Article and Find Full Text PDFRING-between-RING (RBR) E3 ligases mediate ubiquitin transfer through an obligate E3-ubiquitin thioester intermediate prior to substrate ubiquitination. Although RBRs share a conserved catalytic module, substrate recruitment mechanisms remain enigmatic, and the relevant domains have yet to be identified for any member of the class. Here we characterize the interaction between the auto-inhibited RBR, HHARI (AriH1), and its target protein, 4EHP, using a combination of XL-MS, HDX-MS, NMR, and biochemical studies.
View Article and Find Full Text PDFWe performed the first proteogenomic study on a prospectively collected colon cancer cohort. Comparative proteomic and phosphoproteomic analysis of paired tumor and normal adjacent tissues produced a catalog of colon cancer-associated proteins and phosphosites, including known and putative new biomarkers, drug targets, and cancer/testis antigens. Proteogenomic integration not only prioritized genomically inferred targets, such as copy-number drivers and mutation-derived neoantigens, but also yielded novel findings.
View Article and Find Full Text PDFAlternative splicing dramatically increases transcriptome complexity but its contribution to proteome diversity remains controversial. Exon-exon junction spanning peptides provide direct evidence for the translation of specific splice isoforms and are critical for delineating protein isoform complexity. Here we found that junction-spanning peptides are underrepresented in publicly available mass spectrometry-based shotgun proteomics data sets.
View Article and Find Full Text PDFProteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry-based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics.
View Article and Find Full Text PDFBackground And Aims: Proteomics holds promise for individualizing cancer treatment. We analyzed to what extent the proteomic landscape of human colorectal cancer (CRC) is maintained in established CRC cell lines and the utility of proteomics for predicting therapeutic responses.
Methods: Proteomic and transcriptomic analyses were performed on 44 CRC cell lines, compared against primary CRCs (n=95) and normal tissues (n=60), and integrated with genomic and drug sensitivity data.
Lipid identification from data produced with high-throughput technologies is essential to the elucidation of the roles played by lipids in cellular function and disease. Software tools for identifying lipids from tandem mass (MS/MS) spectra have been developed, but they are often costly or lack the sophistication of their proteomics counterparts. We have developed Greazy, an open source tool for the automated identification of phospholipids from MS/MS spectra, that utilizes methods similar to those developed for proteomics.
View Article and Find Full Text PDFBackground: More than half of intravenous drug users (IDUs) in China suffer from the Hepatitis C virus (HCV). The virus is also more prevalent in non-injection drug users (NIDUs) than in the general population. However, not much is known about HCV subtype distribution in these populations.
View Article and Find Full Text PDFQuestions concerning longitudinal data quality and reproducibility of proteomic laboratories spurred the Protein Research Group of the Association of Biomolecular Resource Facilities (ABRF-PRG) to design a study to systematically assess the reproducibility of proteomic laboratories over an extended period of time. Developed as an open study, initially 64 participants were recruited from the broader mass spectrometry community to analyze provided aliquots of a six bovine protein tryptic digest mixture every month for a period of nine months. Data were uploaded to a central repository, and the operators answered an accompanying survey.
View Article and Find Full Text PDFMotivation: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments.
Results: We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen.
Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants.
View Article and Find Full Text PDFShotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets.
View Article and Find Full Text PDFThe proteome informatics research group of the Association of Biomolecular Resource Facilities conducted a study to assess the community's ability to detect and characterize peptides bearing a range of biologically occurring post-translational modifications when present in a complex peptide background. A data set derived from a mixture of synthetic peptides with biologically occurring modifications combined with a yeast whole cell lysate as background was distributed to a large group of researchers and their results were collectively analyzed. The results from the twenty-four participants, who represented a broad spectrum of experience levels with this type of data analysis, produced several important observations.
View Article and Find Full Text PDFDifferentiating and quantifying protein differences in complex samples produces significant challenges in sensitivity and specificity. Label-free quantification can draw from two different information sources: precursor intensities and spectral counts. Intensities are accurate for calculating protein relative abundance, but values are often missing due to peptides that are identified sporadically.
View Article and Find Full Text PDFGenomics Proteomics Bioinformatics
April 2013
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms.
View Article and Find Full Text PDFMass-spectrometry-based proteomics has become an important component of biological research. Numerous proteomics methods have been developed to identify and quantify the proteins in biological and clinical samples, identify pathways affected by endogenous and exogenous perturbations, and characterize protein complexes. Despite successes, the interpretation of vast proteomics datasets remains a challenge.
View Article and Find Full Text PDFLC-MS/MS-based proteomics studies rely on stable analytical system performance that can be evaluated by objective criteria. The National Institute of Standards and Technology (NIST) introduced the MSQC software to compute diverse metrics from experimental LC-MS/MS data, enabling quality analysis and quality control (QA/QC) of proteomics instrumentation. In practice, however, several attributes of the MSQC software prevent its use for routine instrument monitoring.
View Article and Find Full Text PDFSpectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library.
View Article and Find Full Text PDFThe large amount of data produced by proteomics experiments requires effective bioinformatics tools for the integration of data management and data analysis. Here we introduce a suite of tools developed at Vanderbilt University to support production proteomics. We present the Backup Utility Service tool for automated instrument file backup and the ScanSifter tool for data conversion.
View Article and Find Full Text PDFIn shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments.
View Article and Find Full Text PDFToxicoproteomic samples are rich in posttranslational modifications (PTMs) of proteins. Identifying these modifications via standard database searching can incur significant performance penalties. Here, we describe the latest developments in TagRecon, an algorithm that leverages inferred sequence tags to identify modified peptides in toxicoproteomic data sets.
View Article and Find Full Text PDFShotgun proteomics produces collections of tandem mass spectra that contain all the data needed to identify mutated peptides from clinical samples. Identifying these sequence variations, however, has not been feasible with conventional database search strategies, which require exact matches between observed and expected sequences. Searching for mutations as mass shifts on specified residues through database search can incur significant performance penalties and generate substantial false positive rates.
View Article and Find Full Text PDF