[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].

Yi Chuan

Department of Biochemistry and Molecular Biology, School of Basic Medicine, Southwest Medical University, Luzhou 646000,China.

Published: July 2020

The development of sequencing technology has generated huge genomic sequencing information and largely enriched public genetic resources. To analyze such big data, the algorithms and tools for comparison and annotation of genomes are updated continually, enabling genome annotation with higher accuracy various annotation tools. Many prokaryotic genomes in public database were sequenced and assembled more than a decade ago, and they contained multiple genes with unknown functions. To improve the current annotation for those genomes in NCBI, we re-annotate 1587 bacterial and archaeal genomes using multiple prokaryotic gene recognition algorithms/softwares and gene expression data. The 33 Z-curve variables were applied to recognize sequences that were over-annotated to genes of 1587 bacterial and archaeal genomes deposited in public databases, and a total of 3092 sequences belonging to 177 genomes were recognized as sequences over-annotated as protein-coding genes. Next, 4447 protein-coding genes with unknown functions from 939 genomes were annotated with definite functions by similarity search. Finally, we recognized 2003 missed protein-coding genes that belong to known COG (clusters of orthologous groups of proteins) of nine genomes using three methods (ZCURVE 3.0, Glimmer 3.02 and Prodigal), which are accurate and frequently used for gene finding. Their algorithms are different and complementary. This is a comprehensive study for re-annotation of bacterial and archaeal genomes with new tools combining multi-omics data, which should provide a reference for annotation of newly sequenced strains, and also benefit further fundamental researches with the bacterial gene sequences obtained after re-annotation.

Download full-text PDF

Source
http://dx.doi.org/10.16288/j.yczz.20-022DOI Listing

Publication Analysis

Top Keywords

protein-coding genes
16
bacterial archaeal
12
archaeal genomes
12
genomes
10
prokaryotic genomes
8
annotation genomes
8
genes unknown
8
unknown functions
8
1587 bacterial
8
sequences over-annotated
8

Similar Publications

TPD-seq: A high throughput RNA-seq method to derive transcriptomic points of departure from cell lines.

Toxicol In Vitro

December 2024

Faculty of Agricultural and Environmental Sciences, McGill University, Montreal, Canada. Electronic address:

There is growing scientific and regulatory interest in transcriptomic points of departure (tPOD) values from high-throughput in vitro experiments. To further help democratize tPOD research, here we outline 'TPD-seq' which links microplate-based exposure methods involving cell lines for human (Caco-2, Hep G2) and environmental (rainbow trout RTgill-W1) health, with a commercially available RNA-seq kit, with a cloud-based bioinformatics tool (ExpressAnalyst.ca).

View Article and Find Full Text PDF

[Revision of Functionally Relevant and Widely Expressed Long Non-Coding RNAs].

Mol Biol (Mosk)

December 2024

Laboratory of Functional Genomics, Research Centre for Medical Genetics, Moscow, 115522 Russia.

Long non-coding RNAs (lncRNAs) are involved in many cellular processes while displaying high tissue specificity. In contrast, protein-coding genes, including the category of housekeeping ones, exhibit broad expression patterns. The aim of this study was to highlight the functional importance of widely expressed lncRNAs.

View Article and Find Full Text PDF

Fetal growth restriction (FGR) affects between 5-10% of all live births. Placental insufficiency is a leading cause of FGR, resulting in reduced nutrient and oxygen delivery to the fetus. Currently, there are no effective in utero treatment options for FGR, or placental insufficiency.

View Article and Find Full Text PDF

High-quality chromosome-scale genome assembly of Laudakia wui (Laudakia, Agamidae).

Sci Data

December 2024

Key Lab of Biological Resources and Biosecurity of Xizang Autonomous Region, Institute of Plateau Biology of Xizang Autonomous Region, Lhasa, 850001, China.

The Laudakia wui, also known as Wui's rock agama, is a species of agamid lizard endemic to Xizang, China, and is distributed within the Yarlung Zangbo River basin in the Nyingchi city. In order to better understand its ecology, population dynamics, and conservation requirements, we have generated a high-quality chromosome-scale genome with genome size of 1.78 Gb (scaffold N50 = 195.

View Article and Find Full Text PDF

Complete mitochondrial genome sequence of Nannostomus eques and comparative analysis with Nannostomus beckfordi.

Mol Genet Genomics

December 2024

Co-Innovation Center for Sustainable Forestry in Southern China, College of Life Sciences, Nanjing Forestry University, Nanjing, 210037, China.

The brown pencilfish, Nannostomus eques is a lebiasinid harvested for ornamental purposes; however, its complete mitochondrial genome sequence is still unknown. To enrich the molecular genetic information pertaining to Nannostomus, we present here the first report of the complete mitochondrial genome sequence of Nannostomus eques and compare it with Nannostomus beckfordi. The total lengths of the N.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!