The OMOP common data model in Australian primary care data: Building a quality research ready harmonised dataset.

PLoS One

Health & Biomedical Research Information Technology Unit (HaBIC R2), Department of General Practice and Primary Care, Faculty of Medicine, Dentistry & Health Sciences, The University of Melbourne, Parkville, Victoria, Australia.

Published: April 2024

AI Article Synopsis

  • The study examines how converting electronic medical records (EMRs) into the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) improves the ability to analyze health data for research and policy-making.
  • The OMOP-CDM standardizes health data by harmonizing terminologies and coding systems, enhancing research capacity with shared analytical techniques and enabling drug safety surveillance across multiple regions.
  • The research faced challenges in mapping free-text EMR terms to standard vocabularies, necessitating manual assignment for frequently appearing terms while ensuring over 95% of records are linked to an approved vocabulary like SNOMED.

Article Abstract

Background: The use of routinely collected health data for secondary research purposes is increasingly recognised as a methodology that advances medical research, improves patient outcomes, and guides policy. This secondary data, as found in electronic medical records (EMRs), can be optimised through conversion into a uniform data structure to enable analysis alongside other comparable health metric datasets. This can be achieved with the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), which employs a standardised vocabulary to facilitate systematic analysis across various observational databases. The concept behind the OMOP-CDM is the conversion of data into a common format through the harmonisation of terminologies, vocabularies, and coding schemes within a unique repository. The OMOP model enhances research capacity through the development of shared analytic and prediction techniques; pharmacovigilance for the active surveillance of drug safety; and 'validation' analyses across multiple institutions across Australia, the United States, Europe, and the Asia Pacific. In this research, we aim to investigate the use of the open-source OMOP-CDM in the PATRON primary care data repository.

Methods: We used standard structured query language (SQL) to construct, extract, transform, and load scripts to convert the data to the OMOP-CDM. The process of mapping distinct free-text terms extracted from various EMRs presented a substantial challenge, as many terms could not be automatically matched to standard vocabularies through direct text comparison. This resulted in a number of terms that required manual assignment. To address this issue, we implemented a strategy where our clinical mappers were instructed to focus only on terms that appeared with sufficient frequency. We established a specific threshold value for each domain, ensuring that more than 95% of all records were linked to an approved vocabulary like SNOMED once appropriate mapping was completed. To assess the data quality of the resultant OMOP dataset we utilised the OHDSI Data Quality Dashboard (DQD) to evaluate the plausibility, conformity, and comprehensiveness of the data in the PATRON repository according to the Kahn framework.

Results: Across three primary care EMR systems we converted data on 2.03 million active patients to version 5.4 of the OMOP common data model. The DQD assessment involved a total of 3,570 individual evaluations. Each evaluation compared the outcome against a predefined threshold. A 'FAIL' occurred when the percentage of non-compliant rows exceeded the specified threshold value. In this assessment of the primary care OMOP database described here, we achieved an overall pass rate of 97%.

Conclusion: The OMOP CDM's widespread international use, support, and training provides a well-established pathway for data standardisation in collaborative research. Its compatibility allows the sharing of analysis packages across local and international research groups, which facilitates rapid and reproducible data comparisons. A suite of open-source tools, including the OHDSI Data Quality Dashboard (Version 1.4.1), supports the model. Its simplicity and standards-based approach facilitates adoption and integration into existing data processes.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11025850PMC
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0301557PLOS

Publication Analysis

Top Keywords

data
18
primary care
16
common data
12
data model
12
data quality
12
omop common
8
care data
8
ohdsi data
8
quality dashboard
8
omop
6

Similar Publications

Equid alphaherpesvirus 4 (EqAHV4; , ; equine rhinopneumonitis virus) has seldom been associated with complications such as abortion and myeloencephalopathy, given the low tendency of this virus to induce viremia. We investigated the frequency of EqAHV4 viremia in horses with fever and respiratory signs. Case selection included all equids with EqAHV4 quantitative real-time PCR (qPCR)-positive nasal secretions (defined as EqAHV4 qPCR-positive cases) submitted to a diagnostic laboratory.

View Article and Find Full Text PDF

Background: This study aims to explore the interplay between body mass index (BMI), neutrophils, triglyceride levels, and uric acid (UA). Understanding the causal correlation between UA and health indicators, specifically its association with the body's inflammatory conditions, is crucial for preventing and managing various diseases.

Methods: A retrospective analysis was conducted on 4,286 cases utilizing the Spearman correlation method.

View Article and Find Full Text PDF

Background: Cigarette smoking is a leading cause of death and disease, including those related to the cardiovascular system. Cytisine is a plant-based medication, which works in a similar mechanism to varenicline. It is safe, efficacious, and cost-effective for smoking cessation.

View Article and Find Full Text PDF

Background: Immune cells within tumor tissues play important roles in remodeling the tumor microenvironment, thus affecting tumor progression and the therapeutic response. The current study was designed to identify key markers of plasma cells and explore their role in high-grade serous ovarian cancer (HGSOC).

Methods: We utilized single-cell sequencing data from the Gene Expression Omnibus (GEO) database to identify key immune cell types within HGSOC tissues and to extract related markers via the Seurat package.

View Article and Find Full Text PDF

Background: The global incidence of infertility is increasing, and infertility has become an important medical and social issue. With the widespread application of in vitro fertilization-embryo transfer (IVF-ET) technology, the mental health problems of patients undergoing this treatment have gradually attracted widespread attention. The purpose of this study was to explore the relationships among the level of hope, the fertility quality of life and negative emotions of patients who underwent IVF-ET treatment for the first time to provide a scientific basis for subsequent psychological support interventions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!