Background: Proteogenomic mapping is an approach that uses mass spectrometry data from proteins to directly map protein-coding genes and could aid in locating translational regions in the human genome. In concert with the ENcyclopedia of DNA Elements (ENCODE) project, we applied proteogenomic mapping to produce proteogenomic tracks for the UCSC Genome Browser, to explore which putative translational regions may be missing from the human genome.
Results: We generated ~1 million high-resolution tandem mass (MS/MS) spectra for Tier 1 ENCODE cell lines K562 and GM12878 and mapped them against the UCSC hg19 human genome, and the GENCODE V7 annotated protein and transcript sets. We then compared the results from the three searches to identify the best-matching peptide for each MS/MS spectrum, thereby increasing the confidence of the putative new protein-coding regions found via the whole genome search. At a 1% false discovery rate, we identified 26,472, 24,406, and 13,128 peptides from the protein, transcript, and whole genome searches, respectively; of these, 481 were found solely via the whole genome search. The proteogenomic mapping data are available on the UCSC Genome Browser at http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeUncBsuProt.
Conclusions: The whole genome search revealed that ~4% of the uniquely mapping identified peptides were located outside GENCODE V7 annotated exons. The comparison of the results from the disparate searches also identified 15% more spectra than would have been found solely from a protein database search. Therefore, whole genome proteogenomic mapping is a complementary method for genome annotation when performed in conjunction with other searches.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3607840 | PMC |
http://dx.doi.org/10.1186/1471-2164-14-141 | DOI Listing |
Nat Cancer
December 2024
Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA.
Nat Genet
December 2024
Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA.
The integration of quantitative trait loci (QTLs) with disease genome-wide association studies (GWASs) has proven successful in prioritizing candidate genes at disease-associated loci. QTL mapping has been focused on multi-tissue expression QTLs or plasma protein QTLs (pQTLs). We generated a cerebrospinal fluid (CSF) pQTL atlas by measuring 6,361 proteins in 3,506 samples.
View Article and Find Full Text PDFMethods Mol Biol
October 2024
Biomedical and Life Sciences Division, Lancaster University, Lancaster, UK.
The rapid improvements in next-generation sequencing technologies have made it possible to quickly access in-depth genome sequence data. This has resulted in a flurry of genome sequences of various organisms being published and made publicly available in the last two decades. However, not all organisms have genome sequence data available.
View Article and Find Full Text PDFMethods Mol Biol
October 2024
Life Science Informatics, Department of Computer and Information Science, University of Konstanz, Konstanz, Germany.
Proteogenomics combines proteomic and genetic data to gain new insights in molecular mechanisms. Here, we extend this approach toward structural biology from a tool perspective. The chapter starts with tools which can be used to explore genetic information and then enrich those with proteomic data.
View Article and Find Full Text PDFMethods Mol Biol
October 2024
Medical Informatics and Bioinformatics, Institute for Measurement Engineering and Sensor Technology, Hochschule Ruhr West, University of Applied Sciences, Mülheim adR., Germany.
Proteogenomics enables the confirmation and refinement of gene models, the detection of new ones, and the proposition of alternative transcripts using support at the protein level. Such evidence is usually generated using mass spectrometry and subsequent result mapping to various sequence databases. This workflow entails several problems: (1) To speed up the analysis, only a small set of expected proteins is searched; (2) database search tools generally do not provide mapping to the genome; and (3) upon new releases of the sequence databases, expensive rerunning of all results would need to be performed.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!