Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research.
View Article and Find Full Text PDFMotivation: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking.
Results: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects.
The API and associated software is open source and currently available for access at https://github.com/NCATS-Tangerine/translator-knowledge-beacon.
View Article and Find Full Text PDFRice, the primary source of dietary calories for half of humanity, is the first crop plant for which a high-quality reference genome sequence from a single variety was produced. We used resequencing microarrays to interrogate 100 Mb of the unique fraction of the reference genome for 20 diverse varieties and landraces that capture the impressive genotypic and phenotypic diversity of domesticated rice. Here, we report the distribution of 160,000 nonredundant SNPs.
View Article and Find Full Text PDFThe Rice Annotation Project Database (RAP-DB) was created to provide the genome sequence assembly of the International Rice Genome Sequencing Project (IRGSP), manually curated annotation of the sequence, and other genomics information that could be useful for comprehensive understanding of the rice biology. Since the last publication of the RAP-DB, the IRGSP genome has been revised and reassembled. In addition, a large number of rice-expressed sequence tags have been released, and functional genomics resources have been produced worldwide.
View Article and Find Full Text PDFThe Generation Challenge Programme (GCP; www.generationcp.org) has developed an online resource documenting stress-responsive genes comparatively across plant species.
View Article and Find Full Text PDFAmbiguous germplasm identification; difficulty in tracing pedigree information; and lack of integration between genetic resources, characterization, breeding, evaluation, and utilization data are constraints in developing knowledge-intensive crop improvement programs. To address these constraints, the International Crop Information System (www.icis.
View Article and Find Full Text PDFMotivation: The high content of repetitive sequences in the genomes of many higher eukaryotes renders the task of annotating them computationally intensive. Presently, the only widely accepted method of searching and annotating transposable elements (TEs) in large genomic sequences is the use of the RepeatMasker program, which identifies new copies of TEs by pairwise sequence comparisons with a library of known TEs. Profile hidden Markov models (HMMs) have been used successfully in discovering distant homologs of known proteins in large protein databases, but this approach has only rarely been applied to known model TE families in genomic DNA.
View Article and Find Full Text PDFThe International Rice Information System (IRIS, http://www.iris.irri.
View Article and Find Full Text PDF