Interrogating the human genome using uninterpreted mass spectrometry data.

Proteomics

Cell Mapping Project, Glaxo Wellcome R&D, Stevenage, Hertfordshire, UK.

Published: May 2001

The public availability of a draft assembly of the human genome has enabled us to demonstrate, for the first time, the feasibility of searching a complete, unmasked eukaryotic genome using uninterpreted mass spectrometry data. A complex LC-MS/MS data set, containing peptides from at least 22 human proteins, was searched against a comprehensive, nonidentical protein database, an expressed sequence tag (EST) database, and the International Human Genome Project draft assembly of the human genome. The results from the three searches are compared in detail, and the merits of the different databases for this application are discussed. In the case of the EST database, the UniGene index provided a method of simplifying and summarising the search results. In the case of the genomic DNA, the presence of introns prevented matching of roughly one quarter of the spectra, but the technique can provide primary experimental verification of predicted coding sequences, and has the potential to identify novel coding sequences.

Download full-text PDF

Source
http://dx.doi.org/10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO;2-NDOI Listing

Publication Analysis

Top Keywords

human genome
16
genome uninterpreted
8
uninterpreted mass
8
mass spectrometry
8
spectrometry data
8
draft assembly
8
assembly human
8
est database
8
coding sequences
8
genome
5

Similar Publications

The demographic history of a population, and the distribution of fitness effects (DFE) of newly arising mutations in functional genomic regions, are fundamental factors dictating both genetic variation and evolutionary trajectories. Although both demographic and DFE inference has been performed extensively in humans, these approaches have generally either been limited to simple demographic models involving a single population, or, where a complex population history has been inferred, without accounting for the potentially confounding effects of selection at linked sites. Taking advantage of the coding-sparse nature of the genome, we propose a 2-step approach in which coalescent simulations are first used to infer a complex multi-population demographic model, utilizing large non-functional regions that are likely free from the effects of background selection.

View Article and Find Full Text PDF

Genomic data on from the African continent are currently lacking, resulting in the region being under-represented in global analyses of infection (CDI) epidemiology. For the first time in Nigeria, we utilized whole-genome sequencing and phylogenetic tools to compare isolates from diarrhoeic human patients (=142), livestock (=38), poultry manure (=5) and dogs (=9) in the same geographic area (Makurdi, north-central Nigeria) and relate them to the global population. In addition, selected isolates were tested for antimicrobial susceptibility (=33) and characterized by PCR ribotyping (=53).

View Article and Find Full Text PDF

RetroSeeker reveals the characteristics, expression, and evolution of a large set of novel retrotransposons.

Adv Biotechnol (Singap)

October 2023

MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, 510275, Guangdong, China.

Retrotransposons are highly prevalent in most animals and account for more than 35% of the human genome. However, the prevalence, biogenesis mechanism and function of retrotransposons remain largely unknown. Here, we developed retroSeeker, a novel computational software that identifies novel retrotransposons from pairwise alignments of genomes and decodes their biogenesis, expression, evolution and potential functions.

View Article and Find Full Text PDF

Unlabelled: Coronaviruses have large, positive-sense single-stranded RNA genomes that challenge conventional strategies for mutagenesis. Yeast genetics has been used to manipulate large viral genomes, including those of herpesviruses and coronaviruses. This method, known as transformation-associated recombination (TAR), involves assembling complete viral genomes from dsDNA copies of viral genome fragments via homologous recombination in .

View Article and Find Full Text PDF

SARS-CoV-2 CoCoPUTs: analyzing GISAID and NCBI data to obtain codon statistics, mutations, and free energy over a multiyear period.

Virus Evol

January 2025

Hemostasis Branch 1, Division of Hemostasis, Office of Plasma Protein Therapeutics CMC, Office of Therapeutic Products, Center for Biologics Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD 20993, USA.

A consistent area of interest since the beginning of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been the sequence composition of the virus and how it has changed over time. Many resources have been developed for the storage and analysis of SARS-CoV-2 data, such as GISAID (Global Initiative on Sharing All Influenza Data), NCBI, Nextstrain, and outbreak.info.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!