Background: The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST.

Results: By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX.

Conclusions: Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX's website.

Reviewers: This article was reviewed by Daniel Haft, Arcady Mushegian, and M. Pilar Francino (nominated by David Ardell).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3534567PMC
http://dx.doi.org/10.1186/1745-6150-7-37DOI Listing

Publication Analysis

Top Keywords

missed genes
24
genes
9
genes homologs
8
missed
5
thousands missed
4
genes bacterial
4
bacterial genomes
4
genomes analysis
4
analysis combrex
4
combrex background
4

Similar Publications

Introduction: Rhesus macaques have long been a focus of research for understanding immune responses to human pathogens due to their close phylogenetic relationship with humans. As rhesus macaque antibody germlines show high degrees of polymorphism, the spectrum of database-covered genes expressed in individual macaques remains to be determined.

Methods: Here, four rhesus macaques infected with SHIV became a study of interest because they developed broadly neutralizing antibodies against HIV-1.

View Article and Find Full Text PDF

Autoregulation of the glial gene reversed polarity in Drosophila.

Sci Rep

January 2025

Department of Biology, The University of Mississippi, University, MS, 38677, USA.

During development, cells of the nervous system begin as unspecified precursors and proceed along one of two developmental paths to become either neurons or glia. Work in the fruit fly Drosophila melanogaster has established the role of the transcription factor Glial cells missing (Gcm) in directing neuronal precursor cells to assume a glial cell fate. Gcm acts on many target genes, one of which is reversed polarity (repo).

View Article and Find Full Text PDF

Stage-specific modulation of gene expression with muscle GAL4 promoters.

Fly (Austin)

December 2025

Department of Biochemistry and Molecular Biophysics, Kansas State University, Manhattan, KS, USA.

The bipartite GAL4/UAS system is the most widely used method for targeted gene expression in and facilitates rapid genetic experimentation. Defining precise gene expression patterns for tissues and/or cell types under GAL4 control will continue to evolve to suit experimental needs. However, the precise spatial and temporal expression patterns for some commonly used muscle tissue promoters are still unclear.

View Article and Find Full Text PDF

HIV-1 subtype C viruses are responsible for 50% of global HIV burden. However, nearly all currently available reporter viruses widely used in HIV research are based on subtype B. We constructed and characterized a replication-competent HIV-1 subtype C reporter virus expressing mGreenLantern.

View Article and Find Full Text PDF

Thyroid cancer (TC), due to its heterogeneous nature, remains a clinical challenge. Many factors can initiate the carcinogenesis process of various types of TC, which complicates diagnosis and treatment. The presented review gathers current information on specific types of TC, taking into account the effects of the COVID-19 pandemic.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!