Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life.
View Article and Find Full Text PDFThe National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence repository and the PubMed® repository of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 31 distinct repositories and knowledgebases. The E-utilities serve as the programming interface for most of these.
View Article and Find Full Text PDFGenomic sequencing of clinical samples to identify emerging variants of SARS-CoV-2 has been a key public health tool for curbing the spread of the virus. As a result, an unprecedented number of SARS-CoV-2 genomes were sequenced during the COVID-19 pandemic, which allowed for rapid identification of genetic variants, enabling the timely design and testing of therapies and deployment of new vaccine formulations to combat the new variants. However, despite the technological advances of deep sequencing, the analysis of the raw sequence data generated globally is neither standardized nor consistent, leading to vastly disparate sequences that may impact identification of variants.
View Article and Find Full Text PDFFast, efficient public health actions require well-organized and coordinated systems that can supply timely and accurate knowledge. Public databases of pathogen genomic data, such as the International Nucleotide Sequence Database Collaboration (INSDC), have become essential tools for efficient public health decisions. However, these international resources began primarily for academic purposes, rather than for surveillance or interventions.
View Article and Find Full Text PDFIn tissues and organs, the extracellular matrix (ECM) helps maintain inter- and intracellular architectures that sustain the structure-function relationships defining physiological homeostasis. Combining fiber scaffolds and cells to form engineered tissues is a means of replicating these relationships. Engineered tissues' fiber scaffolds are designed to mimic the topology and chemical composition of the ECM network.
View Article and Find Full Text PDFNucleic Acids Res
January 2024
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.
View Article and Find Full Text PDFThis article summarises the activities of the Bacterial Viruses Subcommittee of the International Committee on Taxonomy of Viruses for the period of March 2021-March 2022. We provide an overview of the new taxa proposed in 2021, approved by the Executive Committee, and ratified by vote in 2022. Significant changes to the taxonomy of bacterial viruses were introduced: the paraphyletic morphological families Podoviridae, Siphoviridae, and Myoviridae as well as the order Caudovirales were abolished, and a binomial system of nomenclature for species was established.
View Article and Find Full Text PDFDuring the COVID-19 pandemic, SARS-CoV-2 surveillance efforts integrated genome sequencing of clinical samples to identify emergent viral variants and to support rapid experimental examination of genome-informed vaccine and therapeutic designs. Given the broad range of methods applied to generate new viral genomes, it is critical that consensus and variant calling tools yield consistent results across disparate pipelines. Here we examine the impact of sequencing technologies (Illumina and Oxford Nanopore) and 7 different downstream bioinformatic protocols on SARS-CoV-2 variant calling as part of the NIH Accelerating COVID-19 Therapeutic Interventions and Vaccines (ACTIV) Tracking Resistance and Coronavirus Evolution (TRACE) initiative, a public-private partnership established to address the COVID-19 outbreak.
View Article and Find Full Text PDFThe National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.
View Article and Find Full Text PDFThe COVID-19 pandemic has seen the persistent emergence of immune-evasive SARS-CoV-2 variants under the selection pressure of natural and vaccination-acquired immunity. However, it is currently challenging to quantify how immunologically distinct a new variant is compared to all the prior variants to which a population has been exposed. Here, we define "Distinctiveness" of SARS-CoV-2 sequences based on a proteome-wide comparison with all prior sequences from the same geographical region.
View Article and Find Full Text PDFNucleic Acids Res
January 2022
The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases.
View Article and Find Full Text PDFThe Sequence Read Archive (SRA, https://www.ncbi.nlm.
View Article and Find Full Text PDFHistorically, virus taxonomy has been limited to describing viruses that were readily cultivated in the laboratory or emerging in natural biomes. Metagenomic analyses, single-particle sequencing, and database mining efforts have yielded new sequence data on an astounding number of previously unknown viruses. As metagenomes are relatively free of biases, these data provide an unprecedented insight into the vastness of the virosphere, but to properly value the extent of this diversity it is critical that the viruses are taxonomically classified.
View Article and Find Full Text PDFSequence Read Archive submissions to the National Center for Biotechnology Information often lack useful metadata, which limits the utility of these submissions. We describe the Sequence Taxonomic Analysis Tool (STAT), a scalable k-mer-based tool for fast assessment of taxonomic diversity intrinsic to submissions, independent of metadata. We show that our MinHash-based k-mer tool is accurate and scalable, offering reliable criteria for efficient selection of data for further analysis by the scientific community, at once validating submissions while also augmenting sample metadata with reliable, searchable, taxonomic terms.
View Article and Find Full Text PDFIn this article, we - the Bacterial Viruses Subcommittee and the Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) - summarise the results of our activities for the period March 2020 - March 2021. We report the division of the former Bacterial and Archaeal Viruses Subcommittee in two separate Subcommittees, welcome new members, a new Subcommittee Chair and Vice Chair, and give an overview of the new taxa that were proposed in 2020, approved by the Executive Committee and ratified by vote in 2021. In particular, a new realm, three orders, 15 families, 31 subfamilies, 734 genera and 1845 species were newly created or redefined (moved/promoted).
View Article and Find Full Text PDFHuman respiratory syncytial virus (HRSV) is the leading viral cause of serious pediatric respiratory disease, and lifelong reinfections are common. Its 2 major subgroups, A and B, exhibit some antigenic variability, enabling HRSV to circulate annually. Globally, research has increased the number of HRSV genomic sequences available.
View Article and Find Full Text PDFViruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources.
View Article and Find Full Text PDFNucleic Acids Res
January 2021
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system.
View Article and Find Full Text PDFBackground: GenBank contains over 3 million viral sequences. The National Center for Biotechnology Information (NCBI) previously made available a tool for validating and annotating influenza virus sequences that is used to check submissions to GenBank. Before this project, there was no analogous tool in use for non-influenza viral sequence submissions.
View Article and Find Full Text PDFThis article is a summary of the activities of the ICTV's Bacterial and Archaeal Viruses Subcommittee for the years 2018 and 2019. Highlights include the creation of a new order, 10 families, 22 subfamilies, 424 genera and 964 species. Some of our concerns about the ICTV's ability to adjust to and incorporate new DNA- and protein-based taxonomic tools are discussed.
View Article and Find Full Text PDFNucleic Acids Res
January 2020
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the Entrez system.
View Article and Find Full Text PDFBackground: The January 1, 2018 closure of Memorial Hospital of RI (MHRI) has anecdotally resulted in operational strain for the area's remaining EDs. This study seeks to evaluate the impact on neighboring facilities.
Methods: An interrupted time-series analysis was conducted to compare operational outcomes and demographics pre- and post-MHRI closure.
Tailed bacteriophages are the most abundant and diverse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this diversity is confined to a single virus order-Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically diverse than once thought.
View Article and Find Full Text PDF