Background: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome.

Results: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt.

Conclusions: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3607840PMC
http://dx.doi.org/10.1186/1471-2164-14-141DOI Listing

Publication Analysis

Top Keywords

proteogenomic mapping
20
human genome
12
genome search
12
genome
10
genome proteogenomic
8
encode cell
8
protein-coding regions
8
translational regions
8
ucsc genome
8
genome browser
8

Similar Publications

Article Synopsis
  • Large-scale omics profiling has highlighted numerous somatic mutations and cancer-related proteins, making it difficult to understand their functions in cancer biology.
  • The FunMap network is developed using machine learning on data from 1,194 individuals across 11 cancer types, accurately linking over 10,500 protein-coding genes and identifying important functional protein modules.
  • This study positions FunMap as a valuable tool for interpreting complex cancer data, helping to predict the roles of lesser-known cancer-associated proteins and enhancing strategies for cancer treatment and research.
View Article and Find Full Text PDF

The integration of quantitative trait loci (QTLs) with disease genome-wide association studies (GWASs) has proven successful in prioritizing candidate genes at disease-associated loci. QTL mapping has been focused on multi-tissue expression QTLs or plasma protein QTLs (pQTLs). We generated a cerebrospinal fluid (CSF) pQTL atlas by measuring 6,361 proteins in 3,506 samples.

View Article and Find Full Text PDF

The rapid improvements in next-generation sequencing technologies have made it possible to quickly access in-depth genome sequence data. This has resulted in a flurry of genome sequences of various organisms being published and made publicly available in the last two decades. However, not all organisms have genome sequence data available.

View Article and Find Full Text PDF

From Gene to Whole Cell: Modeling, Visualization, and Analysis.

Methods Mol Biol

October 2024

Life Science Informatics, Department of Computer and Information Science, University of Konstanz, Konstanz, Germany.

Proteogenomics combines proteomic and genetic data to gain new insights in molecular mechanisms. Here, we extend this approach toward structural biology from a tool perspective. The chapter starts with tools which can be used to explore genetic information and then enrich those with proteomic data.

View Article and Find Full Text PDF

Comprehensive Peptide Mapping Is Crucial for Proteogenomics and Proteomics.

Methods Mol Biol

October 2024

Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim adR., Germany.

Proteogenomics enables the confirmation and refinement of gene models, the detection of new ones, and the proposition of alternative transcripts using support at the protein level. Such evidence is usually generated using mass spectrometry and subsequent result mapping to various sequence databases. This workflow entails several problems: (1) To speed up the analysis, only a small set of expected proteins is searched; (2) database search tools generally do not provide mapping to the genome; and (3) upon new releases of the sequence databases, expensive rerunning of all results would need to be performed.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!