"METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".

BMC Bioinformatics

Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD, 20892, USA.

Published: September 2020

Background: The improvements in genomics methods coupled with readily accessible high-throughput sequencing have contributed to our understanding of microbial species, metagenomes, infectious diseases and more. To maximize the impact of these genomics studies, it is important that data from biological samples will become publicly available with standardized metadata. The availability of data at public archives provides the hope that greater insights could be obtained through integration with multi-omics data, reproducibility of published studies, or meta-analyses of large diverse datasets. These datasets should include a description of the host, organism, environmental source of the specimen, spatial-temporal information and other relevant metadata, but unfortunately these attributes are often missing and when present, they show inconsistencies in the use of metadata standards and ontologies.

Results: METAGENOTE ( https://metagenote.niaid.nih.gov ) is a web portal that greatly facilitates the annotation of samples from genomic studies and streamlines the submission process of sequencing files and metadata to the Sequence Read Archive (SRA) (Leinonen R, et al, Nucleic Acids Res, 39:D19-21, 2011) for public access. This platform offers a wide selection of packages for different types of biological and experimental studies with a special emphasis on the standardization of metadata reporting. These packages follow the guidelines from the MIxS standards developed by the Genomics Standard Consortium (GSC) and adopted by the three partners of the International Nucleotides Sequencing Database Collaboration (INSDC) (Cochrane G, et al, Nucleic Acids Res, 44:D48-50, 2016) - National Center for Biotechnology Information (NCBI), European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). METAGENOTE then compiles, validates and manages the submission through an easy-to-use web interface minimizing submission errors and eliminating the need for submitting sequencing files via a separate file transfer mechanism.

Conclusions: METAGENOTE is a public resource that focuses on simplifying the annotation and submission process of data with its corresponding metadata. Users of METAGENOTE will benefit from the easy to use annotation interface but most importantly will be encouraged to publish metadata following standards and ontologies that make the public data available for reuse.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471527PMC
http://dx.doi.org/10.1186/s12859-020-03694-0DOI Listing

Publication Analysis

Top Keywords

metadata
8
sequence read
8
metadata standards
8
submission process
8
sequencing files
8
nucleic acids
8
acids res
8
data
6
submission
5
"metagenote simplified
4

Similar Publications

Neurochemical Databases: Purpose and Expectations.

ACS Chem Neurosci

December 2024

University of Bordeaux, CNRS, Institut des Neurosciences Intégratives et Cognitives d'Aquitaine INCIA CNRS UMR5287, F-33000 Bordeaux, France.

The exploration of increasingly specific brain structures and their relationships, in more nuanced ways, has facilitated the generation of databases for gene expression, connectivity, cell morphology, and electrophysiology. However, neurochemistry, the study of neurochemical environment and transmission, has not yet warranted a public database, despite the plethora of data published. From our viewpoint, a neurochemical database is overdue and would allow the field of neurochemistry to develop facilitating, standardization and reference values, reproducibility, resource efficiency, preservation and accessibility of raw data, hypothesis development and exploration, and metadata analysis.

View Article and Find Full Text PDF

This paper presents the Cadenza Woodwind Dataset. This publicly available data is synthesised audio for woodwind quartets including renderings of each instrument in isolation. The data was created to be used as training data within Cadenza's second open machine learning challenge (CAD2) for the task on rebalancing classical music ensembles.

View Article and Find Full Text PDF

Motivation: Microbial signatures in the human microbiome are closely associated with various human diseases, driving the development of machine learning models for microbiome-based disease prediction. Despite progress, challenges remain in enhancing prediction accuracy, generalizability, and interpretability. Confounding factors, such as host's gender, age, and body mass index, significantly influence the human microbiome, complicating microbiome-based predictions.

View Article and Find Full Text PDF

Motivation: We are witnessing an enormous growth in the amount of molecular profiling (-omics) data. The integration of multi-omics data is challenging. Moreover, human multi-omics data may be privacy-sensitive and can be misused to de-anonymize and (re-)identify individuals.

View Article and Find Full Text PDF

Background: Amplicon sequencing of kingdom-specific tags such as 16S rRNA gene for bacteria and internal transcribed spacer (ITS) region for fungi are widely used for investigating microbial communities. So far most human studies have focused on bacteria while studies on host-associated fungi in health and disease have only recently started to accumulate. To enable cost-effective parallel analysis of bacterial and fungal communities in human and environmental samples, we developed a method where 16S rRNA gene and ITS1 amplicons were pooled together for a single Illumina MiSeq or HiSeq run and analysed after primer-based segregation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!