Reference sequences and annotations serve as the foundation for many lines of research today, from organism and sequence identification to providing a core description of the genes, transcripts and proteins found in an organism's genome. Interpretation of data including transcriptomics, proteomics, sequence variation and comparative analyses based on reference gene annotations informs our understanding of gene function and possible disease mechanisms, leading to new biomedical discoveries. The Reference Sequence (RefSeq) resource created at the National Center for Biotechnology Information (NCBI) leverages both automatic processes and expert curation to create a robust set of reference sequences of genomic, transcript and protein data spanning the tree of life.
View Article and Find Full Text PDFThe National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.
View Article and Find Full Text PDFThe National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases.
View Article and Find Full Text PDFComprehensive genome annotation is essential to understand the impact of clinically relevant variants. However, the absence of a standard for clinical reporting and browser display complicates the process of consistent interpretation and reporting. To address these challenges, Ensembl/GENCODE and RefSeq launched a joint initiative, the Matched Annotation from NCBI and EMBL-EBI (MANE) collaboration, to converge on human gene and transcript annotation and to jointly define a high-value set of transcripts and corresponding proteins.
View Article and Find Full Text PDFEukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature.
View Article and Find Full Text PDFThe Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID).
View Article and Find Full Text PDFThe RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.
View Article and Find Full Text PDFComplete and accurate annotation of the mouse genome is critical to the advancement of research conducted on this important model organism. The National Center for Biotechnology Information (NCBI) develops and maintains many useful resources to assist the mouse research community. In particular, the reference sequence (RefSeq) database provides high-quality annotation of multiple mouse genome assemblies using a combinatorial approach that leverages computation, manual curation, and collaboration.
View Article and Find Full Text PDFThe National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of annotated genomic, transcript and protein sequence records derived from data in public sequence archives and from computation, curation and collaboration (http://www.ncbi.nlm.
View Article and Find Full Text PDFThe Consensus Coding Sequence (CCDS) collaboration involves curators at multiple centers with a goal of producing a conservative set of high quality, protein-coding region annotations for the human and mouse reference genome assemblies. The CCDS data set reflects a 'gold standard' definition of best supported protein annotations, and corresponding genes, which pass a standard series of quality assurance checks and are supported by manual curation. This data set supports use of genome annotation information by human and mouse researchers for effective experimental design, analysis and interpretation.
View Article and Find Full Text PDFEffective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers.
View Article and Find Full Text PDFGene therapy has emerged from the idea of inserting a wild-type copy of a gene in order to restore the proper expression and function of a damaged gene. Initial efforts have focused on finding the proper vector and delivery method to introduce a corrected gene to the affected tissue or cell type. Even though these first attempts are clearly promising, several problems remain unsolved.
View Article and Find Full Text PDFThe sequencing of the complete genomes of several organisms, including humans, has so far not contributed much to our understanding of the mechanisms regulating gene expression in the course of realization of developmental programs. In this so-called "postgenomic" era, we still do not understand how (if at all) the long-range organization of the genome is related to its function. The domain hypothesis of the eukaryotic genome organization postulates that the genome is subdivided into a number of semiindependent functional units (domains) that may include one or several functionally related genes, with these domains having well-defined borders, and operate under the control of special (domain-level) regulatory systems.
View Article and Find Full Text PDFIn order to create an extended map of chromatin features within a mammalian multigene locus, we have determined the extent of nuclease sensitivity and the pattern of histone modifications associated with the mouse beta-globin genes in adult erythroid tissue. We show that the nuclease-sensitive domain encompasses the beta-globin genes along with several flanking olfactory receptor genes that are inactive in erythroid cells. We describe enhancer-blocking or boundary elements on either side of the locus that are bound in vivo by the transcription factor CTCF, but we found that they do not coincide with transitions in nuclease sensitivity flanking the locus or with patterns of histone modifications within it.
View Article and Find Full Text PDFStably integrated transgenes flanked by the chicken beta-globin HS4 insulator are protected against chromosomal position effects and gradual extinction of expression during long-term propagation in culture. To investigate the mechanism of action of this insulator, we used bisulfite genomic sequencing to examine the methylation of individual CpG sites within insulated transgenes, and compared this with patterns of histone acetylation. Surprisingly, although the histones of the entire insulated transgene are highly acetylated, only a specific region in the promoter, containing binding sites for erythroid-specific transcription factors, is highly protected from DNA methylation.
View Article and Find Full Text PDFA binding site for the transcription factor CTCF is responsible for enhancer-blocking activity in a variety of vertebrate insulators, including the insulators at the 5' and 3' chromatin boundaries of the chicken beta-globin locus. To date, no functional domain boundaries have been defined at mammalian beta-globin loci, which are embedded within arrays of functional olfactory receptor genes. In an attempt to define boundary elements that could separate these gene clusters, CTCF-binding sites were searched for at the most distal DNase I-hypersensitive sites (HSs) of the mouse and human beta-globin loci.
View Article and Find Full Text PDF