Common data models standardize the structures and semantics of health datasets, enabling reproducibility and large-scale studies that leverage the data from multiple locations and settings. The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is one of the leading common data models. While there is a strong incentive to convert datasets to OMOP, the conversion is time and resource-intensive, leaving the research community in need of tools for mapping data to OMOP. We propose an extract, transform, load (ETL) framework that is metadata-driven and generic across source datasets. The ETL framework uses a new data manipulation language (DML) that organizes SQL snippets in YAML. Our framework includes a compiler that converts YAML files with mapping logic into an ETL script. Access to the ETL framework is available via a web application, allowing users to upload and edit YAML files via web editor and obtain an ETL SQL script for use in development environments. The structure of the DML maximizes readability, refactoring, and maintainability, while minimizing technical debt and standardizing the writing of ETL operations for mapping to OMOP. Our framework also supports transparency of the mapping process and reuse by different institutions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9000122PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0266911PLOS

Publication Analysis

Top Keywords

common data
12
etl framework
12
extract transform
8
transform load
8
data models
8
yaml files
8
framework
6
data
6
etl
6
omop
5

Similar Publications

Objectives: Chronic skin diseases (CSD) may lead to productivity losses. This mixed-methods study investigated symptom severity, social challenges, need for workplace accommodation, sick leave and their association with perceived impaired work performance (IWP) among workers with CSD.

Methods: Data were collected from April to June 2023.

View Article and Find Full Text PDF

Effectiveness of Synchronous Postdischarge Contacts on Health Care Use and Patient Satisfaction : A Systematic Review and Meta-analysis.

Ann Intern Med

January 2025

Center of Innovation to Accelerate Discovery and Practice Transformation, Durham Veterans Affairs Health Care System; Department of Population Health Sciences, Duke University School of Medicine; and Durham Evidence Synthesis Program, Durham Veterans Affairs Health Care System, Durham, North Carolina (J.M.G.).

Background: Postdischarge contacts (PDCs) after hospitalization are common practice, but their effectiveness in reducing use of acute care after discharge remains unclear.

Purpose: To assess the effects of PDC on 30-day emergency department (ED) visits, 30-day hospital readmissions, and patient satisfaction.

Data Sources: MEDLINE, Embase, and CINAHL searched from 2012 to 25 May 2023.

View Article and Find Full Text PDF

Chemical release data are essential for performing chemical risk assessments to understand the potential exposures arising from industrial processes. Often, these data are unknown or unavailable and must be estimated. A case study of volatile organic compound releases during extrusion-based additive manufacturing is used here to explore the viability of various regression methods for predicting chemical releases to inform chemical assessments.

View Article and Find Full Text PDF

Morbidities and comorbidities associated with optic nerve hypoplasia and septo-optic-pituitary dysplasia.

Dev Med Child Neurol

January 2025

Department of Community Health Sciences, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada.

Aim: To quantify optic nerve hypoplasia (ONH) and septo-optic-pituitary dysplasia (SOD) morbidities and comorbidities.

Method: A retrospective population-based study with a case-control design was undertaken using administrative health data from Manitoba, Canada. Cases were 124 patients with ONH or SOD (70 males, 54 females; age range 6 months-36 years 8 months [mean 13 years, SD 7 years 2 months]) diagnosed from 1990 to 2019, matched to 620 unrelated population-based controls (350 males, 270 females; age range 0-36 years 8 months [mean 12 years 5 months, SD 7 years 2 months]) on birth year, sex, and area of residence.

View Article and Find Full Text PDF

HighDimMixedModels.jl: Robust high-dimensional mixed-effects models across omics data.

PLoS Comput Biol

January 2025

Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.

High-dimensional mixed-effects models are an increasingly important form of regression in which the number of covariates rivals or exceeds the number of samples, which are collected in groups or clusters. The penalized likelihood approach to fitting these models relies on a coordinate descent algorithm that lacks guarantees of convergence to a global optimum. Here, we empirically study the behavior of this algorithm on simulated and real examples of three types of data that are common in modern biology: transcriptome, genome-wide association, and microbiome data.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!