Publications by authors named "Tudor Groza"

Article Synopsis
  • The GA4GH Phenopacket Schema, released in 2022 and approved as a standard by ISO, allows the sharing of clinical and genomic data, including phenotypic descriptions and genetic information, to aid in genomic diagnostics.
  • Phenopacket Store Version 0.1.19 offers a collection of 6668 phenopackets linked to various diseases and genes, making it a crucial resource for testing algorithms and software in genomic research.
  • This collection represents the first extensive case-level, standardized phenotypic information sourced from medical literature, supporting advancements in diagnostic genomics and machine learning applications.
View Article and Find Full Text PDF

Genetic diagnosis plays a crucial role in rare diseases, particularly with the increasing availability of emerging and accessible treatments. The International Rare Diseases Research Consortium (IRDiRC) has set its primary goal as: "Ensuring that all patients who present with a suspected rare disease receive a diagnosis within one year if their disorder is documented in the medical literature". Despite significant advances in genomic sequencing technologies, more than half of the patients with suspected Mendelian disorders remain undiagnosed.

View Article and Find Full Text PDF

Motivation: Human Phenotype Ontology (HPO)-based phenotype concept recognition (CR) underpins a faster and more effective mechanism to create patient phenotype profiles or to document novel phenotype-centred knowledge statements. While the increasing adoption of large language models (LLMs) for natural language understanding has led to several LLM-based solutions, we argue that their intrinsic resource-intensive nature is not suitable for realistic management of the phenotype CR lifecycle. Consequently, we propose to go back to the basics and adopt a dictionary-based approach that enables both an immediate refresh of the ontological concepts as well as efficient re-analysis of past data.

View Article and Find Full Text PDF
Article Synopsis
  • - Rare diseases impact over 300 million people globally and are becoming a priority in global health discussions, recognized by the UN and WHO initiatives.
  • - Individuals with rare diseases often struggle with accessing essential health services like screening, diagnosis, and treatment, highlighting the importance of awareness and education in primary healthcare.
  • - The International Rare Diseases Research Consortium (IRDiRC) is forming a task force to explore ways to enhance the role of primary healthcare providers in overcoming the challenges faced by those with rare diseases.
View Article and Find Full Text PDF

The Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema was released in 2022 and approved by ISO as a standard for sharing clinical and genomic information about an individual, including phenotypic descriptions, numerical measurements, genetic information, diagnoses, and treatments. A phenopacket can be used as an input file for software that supports phenotype-driven genomic diagnostics and for algorithms that facilitate patient classification and stratification for identifying new diseases and treatments. There has been a great need for a collection of phenopackets to test software pipelines and algorithms.

View Article and Find Full Text PDF

Objective: Clinical deep phenotyping and phenotype annotation play a critical role in both the diagnosis of patients with rare disorders as well as in building computationally-tractable knowledge in the rare disorders field. These processes rely on using ontology concepts, often from the Human Phenotype Ontology, in conjunction with a phenotype concept recognition task (supported usually by machine learning methods) to curate patient profiles or existing scientific literature. With the significant shift in the use of large language models (LLMs) for most NLP tasks, we examine the performance of the latest Generative Pre-trained Transformer (GPT) models underpinning ChatGPT as a foundation for the tasks of clinical phenotyping and phenotype annotation.

View Article and Find Full Text PDF

The diagnostic odyssey for people living with rare diseases (PLWRD) is often prolonged for myriad reasons including an initial failure to consider rare disease and challenges to systemically and systematically identifying and tracking undiagnosed diseases across the diagnostic journey. This often results in isolation, uncertainty, a delay to targeted treatments and increase in risk of complications with significant consequences for patient and family wellbeing. This article aims to highlight key time points to consider a rare disease diagnosis along with elements to consider in the potential operational classification for undiagnosed rare diseases during the diagnostic odyssey.

View Article and Find Full Text PDF

Motivation: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other nonstandard ways of representing clinical concepts.

Results: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.

View Article and Find Full Text PDF

The Human Phenotype Ontology (HPO) is a widely used resource that comprehensively organizes and defines the phenotypic features of human disease, enabling computational inference and supporting genomic and phenotypic analyses through semantic similarity and machine learning algorithms. The HPO has widespread applications in clinical diagnostics and translational research, including genomic diagnostics, gene-disease discovery, and cohort analytics. In recent years, groups around the world have developed translations of the HPO from English to other languages, and the HPO browser has been internationalized, allowing users to view HPO term labels and in many cases synonyms and definitions in ten languages in addition to English.

View Article and Find Full Text PDF
Article Synopsis
  • Current large language models like GPT-4 struggle with accurately diagnosing medical conditions from structured data extracted from clinical texts, achieving correct diagnoses only 5.3-17.6% of the time.* -
  • The study highlighted that the performance of the prompts generated from structured data was significantly worse than from original narrative texts, indicating the complexity of clinical language.* -
  • There is a need for further research to improve prompt creation techniques using common clinical data to enhance the effectiveness of AI in supporting medical diagnostics.*
View Article and Find Full Text PDF
Article Synopsis
  • * Phenopacket-tools is an open-source Java library that makes it easier to build, convert, and validate these phenopackets by providing user-friendly tools and predefined components.
  • * The library supports developers in standardizing the collection and sharing of clinical data to enhance genomic diagnostics, research, and precision medicine, with detailed documentation and tutorial resources available online.
View Article and Find Full Text PDF

Experiments in which data are collected by multiple independent resources, including multicentre data, different laboratories within the same centre or with different operators, are challenging in design, data collection and interpretation. Indeed, inconsistent results across the resources are possible. In this paper, we propose a statistical solution for the problem of multi-resource consensus inferences when statistical results from different resources show variation in magnitude, directionality, and significance.

View Article and Find Full Text PDF

The Global Alliance for Genomics and Health (GA4GH) is developing a suite of coordinated standards for genomics for healthcare. The Phenopacket is a new GA4GH standard for sharing disease and phenotype information that characterizes an individual person, linking that individual to detailed phenotypic descriptions, genetic information, diagnoses, and treatments. A detailed example is presented that illustrates how to use the schema to represent the clinical course of a patient with retinoblastoma, including demographic information, the clinical diagnosis, phenotypic features and clinical measurements, an examination of the extirpated tumor, therapies, and the results of genomic analysis.

View Article and Find Full Text PDF

PDCM Finder (www.cancermodels.org) is a cancer research platform that aggregates clinical, genomic and functional data from patient-derived xenografts, organoids and cell lines.

View Article and Find Full Text PDF

The International Mouse Phenotyping Consortium (IMPC; https://www.mousephenotype.org/) web portal makes available curated, integrated and analysed knockout mouse phenotyping data generated by the IMPC project consisting of 85M data points and over 95,000 statistically significant phenotype hits mapped to human diseases.

View Article and Find Full Text PDF

The Global Alliance for Genomics and Health (GA4GH) aims to accelerate biomedical advances by enabling the responsible sharing of clinical and genomic data through both harmonized data aggregation and federated approaches. The decreasing cost of genomic sequencing (along with other genome-wide molecular assays) and increasing evidence of its clinical utility will soon drive the generation of sequence data from tens of millions of humans, with increasing levels of diversity. In this perspective, we present the GA4GH strategies for addressing the major challenges of this data revolution.

View Article and Find Full Text PDF

Exome sequencing has enabled molecular diagnoses for rare disease patients but often with initial diagnostic rates of ~25-30%. Here we develop a robust computational pipeline to rank variants for reassessment of unsolved rare disease patients. A comprehensive web-based patient report is generated in which all deleterious variants can be filtered by gene, variant characteristics, OMIM disease and Phenolyzer scores, and all are annotated with an ACMG classification and links to ClinVar.

View Article and Find Full Text PDF
Article Synopsis
  • The 2015 BioHackathon brought together scientists and developers to create tools for sharing and reusing biological data.
  • They talked about problems with how to represent and use different kinds of biological information, like DNA and proteins.
  • The group shared their progress in fixing these issues and discussed future goals to improve how researchers can use biological data in their work.
View Article and Find Full Text PDF

A lack of robust knowledge of the number of rare diseases and the number of people affected by them limits the development of approaches to ameliorate the substantial cumulative burden of rare diseases. Here, we call for coordinated efforts to more precisely define rare diseases.

View Article and Find Full Text PDF

Background: This study provides an integrated assessment of the economic and social impacts of genomic sequencing for the detection of monogenic disorders resulting in intellectual disability (ID).

Methods: Multiple knowledge bases were cross-referenced and analysed to compile a reference list of monogenic disorders associated with ID. Multiple literature searches were used to quantify the health and social costs for the care of people with ID.

View Article and Find Full Text PDF

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven't been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search.

View Article and Find Full Text PDF

The Human Phenotype Ontology (HPO) is a standardized set of phenotypic terms that are organized in a hierarchical fashion. It is a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics. The HPO is frequently used to create a set of terms that accurately describe the observed clinical abnormalities of an individual being evaluated for suspected rare genetic disease.

View Article and Find Full Text PDF