Purpose: The ongoing lack of data standardization severely undermines the potential for automated learning from the vast amount of information routinely archived in electronic health records (EHRs), radiation oncology information systems, treatment planning systems, and other cancer care and outcomes databases. We sought to create a standardized ontology for clinical data, social determinants of health, and other radiation oncology concepts and interrelationships.
Methods And Materials: The American Association of Physicists in Medicine's Big Data Science Committee was initiated in July 2019 to explore common ground from the stakeholders' collective experience of issues that typically compromise the formation of large inter- and intra-institutional databases from EHRs.
Caminibacter mediatlanticus strain TB-2(T) [1], is a thermophilic, anaerobic, chemolithoautotrophic bacterium, isolated from the walls of an active deep-sea hydrothermal vent chimney on the Mid-Atlantic Ridge and the type strain of the species. C. mediatlanticus is a Gram-negative member of the Epsilonproteobacteria (order Nautiliales) that grows chemolithoautotrophically with H(2) as the energy source and CO(2) as the carbon source.
View Article and Find Full Text PDFThe JCVI metagenomics analysis pipeline provides for the efficient and consistent annotation of shotgun metagenomics sequencing data for sampling communities of prokaryotic organisms. The process can be equally applied to individual sequence reads from traditional Sanger capillary electrophoresis sequences, newer technologies such as 454 pyrosequencing, or sequence assemblies derived from one or more of these data types. It includes the analysis of both coding and non-coding genes, whether full-length or, as is often the case for shotgun metagenomics, fragmentary.
View Article and Find Full Text PDFThe HuRef Genome Browser is a web application for the navigation and analysis of the previously published genome of a human individual, termed HuRef. The browser provides a comparative view between the NCBI human reference sequence and the HuRef assembly, and it enables the navigation of the HuRef genome in the context of HuRef, NCBI and Ensembl annotations. Single nucleotide polymorphisms, indels, inversions, structural and copy-number variations are shown in the context of existing functional annotations on either genome in the comparative view.
View Article and Find Full Text PDFThis meeting report summarizes the proceedings of the "eGenomics: Cataloguing our Complete Genome Collection IV" workshop held June 6-8, 2007, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This fourth workshop of the Genomic Standards Consortium (GSC) was a mix of short presentations, strategy discussions, and technical sessions. Speakers provided progress reports on the development of the "Minimum Information about a Genome Sequence" (MIGS) specification and the closely integrated "Minimum Information about a Metagenome Sequence" (MIMS) specification.
View Article and Find Full Text PDFThe Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis.
View Article and Find Full Text PDFWith the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC).
View Article and Find Full Text PDFPresented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region.
View Article and Find Full Text PDFWe present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at approximately 1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements.
View Article and Find Full Text PDFThe world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.
View Article and Find Full Text PDFThe CAMERA (Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis) community database for metagenomic data deposition is an important first step in developing methods for monitoring microbial communities.
View Article and Find Full Text PDFSince its introduction a decade ago, whole-genome shotgun sequencing (WGS) has been the main approach for producing cost-effective and high-quality genome sequence data. Until now, the Sanger sequencing technology that has served as a platform for WGS has not been truly challenged by emerging technologies. The recent introduction of the pyrosequencing-based 454 sequencing platform (454 Life Sciences, Branford, CT) offers a very promising sequencing technology alternative for incorporation in WGS.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
February 2004
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.
View Article and Find Full Text PDFThe high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22.
View Article and Find Full Text PDF