Recent advances in proteomic mass spectrometry (MS) offer the chance to marry high-throughput peptide sequencing to transcript models, allowing the validation, refinement, and identification of new protein-coding loci. We present a novel pipeline that integrates highly sensitive and statistically robust peptide spectrum matching with genome-wide protein-coding predictions to perform large-scale gene validation and discovery in the mouse genome for the first time. In searching an excess of 10 million spectra, we have been able to validate 32%, 17%, and 7% of all protein-coding genes, exons, and splice boundaries, respectively. Moreover, we present strong evidence for the identification of multiple alternatively spliced translations from 53 genes and have uncovered 10 entirely novel protein-coding genes, which are not covered in any mouse annotation data sources. One such novel protein-coding gene is a fusion protein that spans the Ins2 and Igf2 loci to produce a transcript encoding the insulin II and the insulin-like growth factor 2-derived peptides. We also report nine processed pseudogenes that have unique peptide hits, demonstrating, for the first time, that they are not just transcribed but are translated and are therefore resurrected into new coding loci. This work not only highlights an important utility for MS data in genome annotation but also provides unique insights into the gene structure and propagation in the mouse genome. All these data have been subsequently used to improve the publicly available mouse annotation available in both the Vega and Ensembl genome browsers (http://vega.sanger.ac.uk).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3083093PMC
http://dx.doi.org/10.1101/gr.114272.110DOI Listing

Publication Analysis

Top Keywords

novel protein-coding
12
protein-coding genes
12
mouse genome
12
mouse annotation
8
protein-coding
6
mouse
5
genome
5
shotgun proteomics
4
proteomics aids
4
aids discovery
4

Similar Publications

The extent of functional sequences within the human genome is a pivotal yet debated topic in biology. Although high-throughput reverse genetic screens have made strides in exploring this, they often limit their scope to known genomic elements and may introduce non-specific effects. This underscores the urgent need for novel functional genomics tools that enable a deeper, unbiased understanding of genome functionality.

View Article and Find Full Text PDF

Planiliza haematocheilus, a teleostan species noted for its ecological adaptability and economic significance, thrives in both freshwater and marine environments. This study presents a novel chromosome-level genome assembly through Hi-C, PacBio CCS, and Illumina sequencing methods. The assembled genome has a final size of 651.

View Article and Find Full Text PDF

Expanding the clinical spectrum of 19p13.3 microduplication syndrome: a case report highlighting nephrotic syndrome and literature review.

BMC Pediatr

January 2025

Pediatric Internal Medicine, Yantai Yuhuangding Hospital, No.20 Yuhuangding East Road, Zhifu District, Yantai City, Shandong, 264000, China.

Background: Common clinical findings in patients with 19p13.3 duplication include intrauterine growth restriction, intellectual disability, developmental delay, microcephaly, and distinctive facial features. In this study, we report the case of a patient with 19p13.

View Article and Find Full Text PDF

Alzheimer's disease (AD), a progressive neurodegenerative disorder, is frequently associated with musculoskeletal complications, including sarcopenia and osteoporosis, which substantially impair patient quality of life. Despite these clinical observations, the molecular mechanisms linking AD to bone loss remain insufficiently explored. In this study, we examined the femoral bone microarchitecture and transcriptomic profiles of APP/PS1 transgenic mouse models of AD to elucidate the disease's impact on bone pathology and identify potential gene candidates associated with bone deterioration.

View Article and Find Full Text PDF

A Gram-stain-negative, aerobic and rod-shaped bacterium, designated as HZG-20, was isolated from a tidal flat in Zhoushan, Zhejiang Province, China. The 16S rRNA sequence similarities between strain HZG-20 and RR4-56, NNCM2, P31 and X9-2-2 were 98.9, 91.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!