Flexible Data Analysis Pipeline for High-Confidence Proteogenomics.

Hendrik Weisser James C Wright Jonathan M Mudge Petra Gutenbrunner Jyoti S Choudhary

J Proteome Res

School of Informatics, Communications, and Media, University of Applied Sciences Upper Austria, Hagenberg 4232, Austria.

Published: December 2016

Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are "novel" peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such "novel" peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5703597	PMC
http://dx.doi.org/10.1021/acs.jproteome.6b00765	DOI Listing

Publication Analysis

Top Keywords

data analysis

analysis pipeline

proteomic data

"novel" peptides

peptide identification

pipeline

flexible data

pipeline high-confidence

high-confidence proteogenomics

proteogenomics proteogenomics

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!