Publications by authors named "Mungall C"

Ontologies and knowledge graphs (KGs) are general-purpose computable representations of some domain, such as human anatomy, and are frequently a crucial part of modern information systems. Most of these structures change over time, incorporating new knowledge or information that was previously missing. Managing these changes is a challenge, both in terms of communicating changes to users and providing mechanisms to make it easier for multiple stakeholders to contribute.

View Article and Find Full Text PDF

Whole genome sequencing has transformed rare disease research; however, 50-80% of rare disease patients remain undiagnosed after such testing. Regular reanalysis can identify new diagnoses, especially in newly discovered disease-gene associations, but efficient tools are required to support clinical interpretation. Exomiser, a phenotype-driven variant prioritisation tool, fulfils this role; within the 100,000 Genomes Project (100kGP), diagnoses were identified after reanalysis in 463 (2%) of 24,015 unsolved patients after previous analysis for variants in known disease genes.

View Article and Find Full Text PDF
Article Synopsis
  • Biomedical research is increasingly integrating artificial intelligence (AI) and machine learning (ML) to tackle complex challenges, necessitating a focus on ethical and explainable AI (XAI) due to the complexities of deep learning methods.
  • The NIH's Bridge2AI program is working on creating new flagship datasets aimed at enhancing AI/ML applications in biomedicine while establishing best practices, tools, standards, and criteria for assessing the data's AI readiness, including legal and ethical considerations.
  • The article outlines foundational criteria developed by the NIH Bridge2AI Standards Working Group to ensure the scientific rigor and ethical use of AI in biomedical research, emphasizing the need for ongoing adaptation as the field evolves.
View Article and Find Full Text PDF
Article Synopsis
  • Ontologies are key for managing consensus knowledge in areas like biomedical, environmental, and food sciences, but creating and maintaining them requires significant resources and collaboration among experts.
  • The Dynamic Retrieval Augmented Generation of Ontologies using AI (DRAGON-AI) leverages Large Language Models and Retrieval Augmented Generation to automate the generation of ontology components, showing high precision in relationship creation and ability to produce acceptable definitions.
  • While DRAGON-AI can significantly support ontology development, expert curators remain essential for overseeing the quality and accuracy of the generated content.
View Article and Find Full Text PDF
Article Synopsis
  • The GA4GH Phenopacket Schema, released in 2022 and approved as a standard by ISO, allows the sharing of clinical and genomic data, including phenotypic descriptions and genetic information, to aid in genomic diagnostics.
  • Phenopacket Store Version 0.1.19 offers a collection of 6668 phenopackets linked to various diseases and genes, making it a crucial resource for testing algorithms and software in genomic research.
  • This collection represents the first extensive case-level, standardized phenotypic information sourced from medical literature, supporting advancements in diagnostic genomics and machine learning applications.
View Article and Find Full Text PDF
Article Synopsis
  • Phenotypic data helps us understand how genomic variations affect living organisms and is vital for clinical applications like diagnosing diseases and developing treatments.
  • The field of phenomics aims to unify and analyze the vast amounts of phenotypic data collected over time, but faces challenges due to inconsistent methods and vocabularies used to record this information.
  • The Unified Phenotype Ontology (uPheno) framework offers a solution by providing a standardized system for organizing phenotype terms, allowing for better integration of data across different species and improving research on genotype-phenotype associations.
View Article and Find Full Text PDF
Article Synopsis
  • Advanced omics technologies produce a large amount of data daily, but often lack the necessary metadata, making it difficult for researchers to effectively access and utilize the data.
  • Machine learning (ML) techniques are emerging as solutions for automatically annotating these datasets, but the process of text labeling to validate this metadata remains manual and time-consuming, highlighting the need for automation.
  • This paper presents two new automated text labeling approaches aimed at improving metadata validation in environmental genomics, utilizing relationships between data sources and controlled vocabularies to enhance the efficiency and effectiveness of metadata extraction.
View Article and Find Full Text PDF

Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for the clinical management of patients. Currently, manual biocuration is used to assign MAxO terms to rare diseases, enabling clinical management of rare diseases to be described computationally for use in clinical decision support and mechanism discovery.

View Article and Find Full Text PDF
Article Synopsis
  • - Large language models (LLMs) are being tested for their ability to help diagnose genetic diseases, but their evaluation is complicated due to how they generate unstructured responses.
  • - Researchers benchmarked LLMs against 5,213 case reports using established phenotypic criteria and compared their performance to a traditional diagnostic tool, Exomiser.
  • - The best-performing LLM correctly diagnosed cases 23.6% of the time, while Exomiser achieved 35.5%, indicating that while LLMs are improving, they still lag behind conventional bioinformatics methods and need further research for effective integration into diagnostic processes.
View Article and Find Full Text PDF

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools.

View Article and Find Full Text PDF

Background –: Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes application of existing information-based tools to support advancement in veterinary diagnostics, treatments, and precision medicine.

Hypothesis/objectives –: Creation of a Vertebrate Breed Ontology (VBO) as a single, coherent logic-based standard for documenting breed names in animal health, production and research-related records will improve data use capabilities in veterinary and comparative medicine.

Animals –: No live animals were used in this study.

View Article and Find Full Text PDF

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms.

View Article and Find Full Text PDF
Article Synopsis
  • Comparative analysis of genomes requires collecting and integrating high-quality data using standardized protocols.
  • The Genomic Standards Consortium (GSC) works with researchers to create and uphold the MIxS reporting standard for genomic data.
  • This document explains MIxS structure and terminology, guides users on finding necessary terms, and includes a practical example using soil metagenomes.
View Article and Find Full Text PDF

Objective: Female reproductive disorders (FRDs) are common health conditions that may present with significant symptoms. Diet and environment are potential areas for FRD interventions. We utilized a knowledge graph (KG) method to predict factors associated with common FRDs (for example, endometriosis, ovarian cyst, and uterine fibroids).

View Article and Find Full Text PDF
Article Synopsis
  • Translational research needs data from different levels of biological systems, but combining that data is tough for scientists.
  • New technologies help gather more data, but researchers struggle to organize all the information effectively.
  • PheKnowLator is a tool that helps scientists create customizable knowledge graphs easily, making it better for managing complex health information without slowing down their work.
View Article and Find Full Text PDF

Bridging molecular information to ecosystem-level processes would provide the capacity to understand system vulnerability and, potentially, a means for assessing ecosystem health. Here, we present an integrated dataset containing environmental and metagenomic information from plant-associated microbial communities, plant transcriptomics, plant and soil metabolomics, and soil chemistry and activity characterization measurements derived from the model tree species Populus trichocarpa. Soil, rhizosphere, root endosphere, and leaf samples were collected from 27 different P.

View Article and Find Full Text PDF

Motivation: Graph representation learning is a family of related approaches that learn low-dimensional vector representations of nodes and other graph elements called embeddings. Embeddings approximate characteristics of the graph and can be used for a variety of machine-learning tasks such as novel edge prediction. For many biomedical applications, partial knowledge exists about positive edges that represent relationships between pairs of entities, but little to no knowledge is available about negative edges that represent the explicit lack of a relationship between two nodes.

View Article and Find Full Text PDF

The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem.

View Article and Find Full Text PDF

Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.

Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema.

View Article and Find Full Text PDF

Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation.

View Article and Find Full Text PDF

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods.

View Article and Find Full Text PDF

The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR-bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability, and Technology (TRUST). The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility, and expressivity in an increasingly decentralised genomic data ecosystem.

View Article and Find Full Text PDF

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research.

View Article and Find Full Text PDF

Background: Navigating the clinical literature to determine the optimal clinical management for rare diseases presents significant challenges. We introduce the Medical Action Ontology (MAxO), an ontology specifically designed to organize medical procedures, therapies, and interventions.

Methods: MAxO incorporates logical structures that link MAxO terms to numerous other ontologies within the OBO Foundry.

View Article and Find Full Text PDF

The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English.

View Article and Find Full Text PDF