Accurate Estimation of Context-Dependent False Discovery Rates in Top-Down Proteomics.

Mol Cell Proteomics

From the ‡Proteomics Center of Excellence, Northwestern University, Evanston, Illinois;; §Department of Molecular Biosciences, Northwestern University, Evanston, Illinois;; Department of Chemistry and the Feinberg School of Medicine, Northwestern University, Evanston, Illinois. Electronic address:

Published: April 2019

Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6442365PMC
http://dx.doi.org/10.1074/mcp.RA118.000993DOI Listing

Publication Analysis

Top Keywords

top-down proteomics
16
false discovery
8
fdr calculation
8
search engine
8
fdr
6
top-down
5
search
5
accurate estimation
4
estimation context-dependent
4
context-dependent false
4

Similar Publications

Proteoform Identification and Quantification Based on Alignment Graphs.

Bioinformatics

January 2025

Department of Computer Science, City University of Hong Kong, Hong Kong, China.

Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites.

View Article and Find Full Text PDF

Proteo-SAFARI is a shiny application for fragment assignment by relative isotopes, an R-based software application designed for identification of protein fragment ions directly in the / domain. This program provides an open-source, user-friendly application for identification of fragment ions from a candidate protein sequence with support for custom covalent modifications and various visualizations of identified fragments. Additionally, Proteo-SAFARI includes a nonnegative least-squares fitting approach to determine the contributions of various hydrogen shifted fragment ions ( + 1, + 1, - 1, - 2) observed in UVPD mass spectra which exhibit overlapping isotopic distributions.

View Article and Find Full Text PDF

Intact protein analysis using mass spectrometry (MS) is an important technique to characterize and provide a comprehensive overview of protein complexity. It is also the basis of "top-down" approaches in proteomics to describe the proteoforms of single protein's post-translational modifications (PTMs). MS-based analysis of intact proteins benefits from high-resolution separations prior to electrospray ionization.

View Article and Find Full Text PDF

Protein "purity," proteoforms, and the albuminome: critical observations on proteome and systems complexity.

Front Cell Dev Biol

December 2024

Proteomics, Lipidomics and Metabolomics Core Facility, School of Life Sciences, Faculty of Science, University of Technology Sydney, Ultimo, NSW, Australia.

Introduction: The identification of effective, selective biomarkers and therapeutics is dependent on truly deep, comprehensive analysis of proteomes at the proteoform level.

Methods: Bovine serum albumin (BSA) isolated by two different protocols, cold ethanol fractionation and heat shock fractionation, was resolved and identified using Integrative Top-down Proteomics, the tight coupling of two-dimensional gel electrophoresis (2DE) with liquid chromatography and tandem mass spectrometry (LC-MS/MS).

Results And Discussion: Numerous proteoforms were identified in both "purified" samples, across a broad range of isoelectric points and molecular weights.

View Article and Find Full Text PDF

S-glutathionylation (SSG) is increasingly recognized as a critical signaling mechanism in the heart, yet SSG modifications in cardiac sarcomeric proteins remain understudied. Here we identified SSG of the ventricular isoform of myosin light chain 1 (MLC-1v) in human, swine, and mouse cardiac tissues using top-down mass spectrometry (MS)-based proteomics. Our results enabled the accurate identification, quantification, and site-specific localization of SSG in MLC-1v across different species.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!