Background: Given the geographical sparsity of Rare Diseases (RDs), assembling a cohort is often a challenging task. Common data models (CDM) can harmonize disparate sources of data that can be the basis of decision support systems and artificial intelligence-based studies, leading to new insights in the field. This work is sought to support the design of large-scale multi-center studies for rare diseases.

Methods: In an interdisciplinary group, we derived a list of elements of RDs in three medical domains (endocrinology, gastroenterology, and pneumonology) according to specialist knowledge and clinical guidelines in an iterative process. We then defined a RDs data structure that matched all our data elements and built Extract, Transform, Load (ETL) processes to transfer the structure to a joint CDM. To ensure interoperability of our developed CDM and its subsequent usage for further RDs domains, we ultimately mapped it to Observational Medical Outcomes Partnership (OMOP) CDM. We then included a fourth domain, hematology, as a proof-of-concept and mapped an acute myeloid leukemia (AML) dataset to the developed CDM.

Results: We have developed an OMOP-based rare diseases common data model (RD-CDM) using data elements from the three domains (endocrinology, gastroenterology, and pneumonology) and tested the CDM using data from the hematology domain. The total study cohort included 61,697 patients. After aligning our modules with those of Medical Informatics Initiative (MII) Core Dataset (CDS) modules, we leveraged its ETL process. This facilitated the seamless transfer of demographic information, diagnoses, procedures, laboratory results, and medication modules from our RD-CDM to the OMOP. For the phenotypes and genotypes, we developed a second ETL process. We finally derived lessons learned for customizing our RD-CDM for different RDs.

Discussion: This work can serve as a blueprint for other domains as its modularized structure could be extended towards novel data types. An interdisciplinary group of stakeholders that are actively supporting the project's progress is necessary to reach a comprehensive CDM.

Conclusion: The customized data structure related to our RD-CDM can be used to perform multi-center studies to test data-driven hypotheses on a larger scale and take advantage of the analytical tools offered by the OHDSI community.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11325822PMC
http://dx.doi.org/10.1186/s13023-024-03312-9DOI Listing

Publication Analysis

Top Keywords

common data
12
rare diseases
12
data
10
data models
8
lessons learned
8
multi-center studies
8
interdisciplinary group
8
domains endocrinology
8
endocrinology gastroenterology
8
gastroenterology pneumonology
8

Similar Publications

Objectives: Chronic skin diseases (CSD) may lead to productivity losses. This mixed-methods study investigated symptom severity, social challenges, need for workplace accommodation, sick leave and their association with perceived impaired work performance (IWP) among workers with CSD.

Methods: Data were collected from April to June 2023.

View Article and Find Full Text PDF

Effectiveness of Synchronous Postdischarge Contacts on Health Care Use and Patient Satisfaction : A Systematic Review and Meta-analysis.

Ann Intern Med

January 2025

Center of Innovation to Accelerate Discovery and Practice Transformation, Durham Veterans Affairs Health Care System; Department of Population Health Sciences, Duke University School of Medicine; and Durham Evidence Synthesis Program, Durham Veterans Affairs Health Care System, Durham, North Carolina (J.M.G.).

Background: Postdischarge contacts (PDCs) after hospitalization are common practice, but their effectiveness in reducing use of acute care after discharge remains unclear.

Purpose: To assess the effects of PDC on 30-day emergency department (ED) visits, 30-day hospital readmissions, and patient satisfaction.

Data Sources: MEDLINE, Embase, and CINAHL searched from 2012 to 25 May 2023.

View Article and Find Full Text PDF

Chemical release data are essential for performing chemical risk assessments to understand the potential exposures arising from industrial processes. Often, these data are unknown or unavailable and must be estimated. A case study of volatile organic compound releases during extrusion-based additive manufacturing is used here to explore the viability of various regression methods for predicting chemical releases to inform chemical assessments.

View Article and Find Full Text PDF

Morbidities and comorbidities associated with optic nerve hypoplasia and septo-optic-pituitary dysplasia.

Dev Med Child Neurol

January 2025

Department of Community Health Sciences, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada.

Aim: To quantify optic nerve hypoplasia (ONH) and septo-optic-pituitary dysplasia (SOD) morbidities and comorbidities.

Method: A retrospective population-based study with a case-control design was undertaken using administrative health data from Manitoba, Canada. Cases were 124 patients with ONH or SOD (70 males, 54 females; age range 6 months-36 years 8 months [mean 13 years, SD 7 years 2 months]) diagnosed from 1990 to 2019, matched to 620 unrelated population-based controls (350 males, 270 females; age range 0-36 years 8 months [mean 12 years 5 months, SD 7 years 2 months]) on birth year, sex, and area of residence.

View Article and Find Full Text PDF

HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data.

PLoS Comput Biol

January 2025

Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

High-dimensional mixed-effects models are an increasingly important form of regression in which the number of covariates rivals or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate descent algorithm that lacks guarantees of convergence to a global optimum. Here, we empirically study the behavior of this algorithm on simulated and real examples of three types of data that are common in modern biology: transcriptome, genome-wide association, and microbiome data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!