Annotation of 2,507 genomes.

Microbiol Spectr

Microbial Technology Institute and State Key Laboratory of Microbial Technology, Shandong University, Qingdao, China.

Published: April 2024

AI Article Synopsis

Article Abstract

(baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Currently, a huge number of genome sequences are available at the public domain. However, these genomes are distributed at different websites and a large number of them are released without annotation information. To provide one complete annotated genome data resource, we collected 2,507 genome assemblies and re-annotated 2,506 assemblies using a custom annotation pipeline, producing a total of 15,407,164 protein-coding gene models. With a custom pipeline, all these gene sequences were clustered into families. A total of 1,506 single-copy genes were selected as marker genes, which were then used to evaluate the genome completeness and base qualities of all assemblies. Pangenomic analyses were performed based on a selected subset of 847 medium-high-quality genomes. Statistical comparisons revealed a number of gene families showing copy number variations among different organism sources. To the authors' knowledge, this study represents the largest genome annotation project of so far, providing rich genomic resources for the future studies of the model organism and its relatives.IMPORTANCE (baker's yeast, budding yeast) is one of the most important model organisms for biological research and is a crucial microorganism in industry. Though a huge number of genome sequences are available at the public domain, these genomes are distributed at different websites and most are released without annotation, hindering the efficient reuse of these genome resources. Here, we collected 2,507 genomes for , performed genome annotation, and evaluated the genome qualities. All the obtained data have been deposited at public repositories and are freely accessible to the community. This study represents the largest genome annotation project of so far, providing one complete annotated genome data set for , an important workhorse for fundamental biology, biotechnology, and industry.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10986567PMC
http://dx.doi.org/10.1128/spectrum.03582-23DOI Listing

Publication Analysis

Top Keywords

genome annotation
12
genome
11
2507 genomes
8
baker's yeast
8
yeast budding
8
budding yeast
8
yeast model
8
model organisms
8
organisms biological
8
biological crucial
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!