Guidelines for Gene and Genome Assembly Nomenclature.

Genetics

EMBL-EBI - Non-Vertebrate Genomics Team, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.

Published: January 2025

The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration (INSDC) consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers.

Download full-text PDF

Source
http://dx.doi.org/10.1093/genetics/iyaf006DOI Listing

Publication Analysis

Top Keywords

genome assemblies
12
genomic data
12
assemblies
6
genomic
5
data
5
guidelines gene
4
genome
4
gene genome
4
genome assembly
4
assembly nomenclature
4

Similar Publications

Complete genome sequences of associated with mortality events in avian species.

Microbiol Resour Announc

January 2025

Department of Infectious Diseases, Athens Veterinary Diagnostic Laboratory, The University of Georgia, Athens, Georgia, USA.

is a potential bacterial pathogen that affects chickens. We present 22 complete genome sequences of clinical isolates to facilitate the genomic analysis and the development of diagnostic tools.

View Article and Find Full Text PDF

Machine learning reveals the dynamic importance of accessory sequences for outbreak clustering.

mBio

January 2025

Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada.

Unlabelled: Bacterial typing at whole-genome scales is now feasible owing to decreasing costs in high-throughput sequencing and the recent advances in computation. The unprecedented resolution of whole-genome typing is achieved by genotyping the variable segments of bacterial genomes that can fluctuate significantly in gene content. However, due to the transient and hypervariable nature of many accessory elements, the value of the added resolution in outbreak investigations remains disputed.

View Article and Find Full Text PDF

Cloning a Chloroplast Genome in and .

Bio Protoc

January 2025

Biochemistry Department, Western University, London, Canada.

Chloroplast genomes present an alternative strategy for large-scale engineering of photosynthetic eukaryotes. Prior to our work, the chloroplast genomes of (204 kb) and (140 kb) had been cloned using bacterial and yeast artificial chromosome (BAC/YAC) libraries, respectively. These methods lack design flexibility as they are reliant upon the random capture of genomic fragments during BAC/YAC library creation; additionally, both demonstrated a low efficiency (≤ 10%) for correct assembly of the genome in yeast.

View Article and Find Full Text PDF

Graph-based pangenome provides insights into the structural variation and genetic basis of metabolic traits in potato.

Mol Plant

January 2025

State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Agriculture, Yunnan University, Kunming 650540, China; Southwest United Graduate School,Kunming 650500, China. Electronic address:

Potato is the world's most important nongrain crop. Here, we report that 29 genomes from Petota and Etuberosum sections were de novo assembled, and that 248 accessions of wild potatoes, landraces and modern cultivars were re-sequenced at > 25× depth to assess genetic diversity within the Petota section. Subsequently, a graph-based pangenome was constructed by using DM8.

View Article and Find Full Text PDF

Antarctic Geothermal Soils Exhibit an Absence of Regional Habitat Generalist Microorganisms.

Environ Microbiol

January 2025

Thermophile Research Unit, Te Aka Mātuatua, School of Science, Te Whare Wānanga o Waikato, University of Waikato, Hamilton, Aotearoa-New Zealand.

Active geothermal systems are relatively rare in Antarctica and represent metaphorical islands ideal to study microbial dispersal. In this study, we tested the macro-ecological concept that high dispersal rates result in communities being dominated by either habitat generalists or specialists by investigating the microbial communities on four geographically separated geothermal sites on three Antarctic volcanoes (Mts. Erebus, Melbourne, and Rittman).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!