Data-driven approach for creating synthetic electronic medical records.

BMC Med Inform Decis Mak

Johns Hopkins University Applied Physics Laboratory, 11100 Johns Hopkins Rd, Laurel, MD 20723-6099, USA.

Published: October 2010

Background: New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed.

Methods: This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population.

Results: We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified.

Conclusions: A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2972239PMC
http://dx.doi.org/10.1186/1472-6947-10-59DOI Listing

Publication Analysis

Top Keywords

synthetic emrs
12
synthetic
11
emrs
10
records
9
electronic medical
8
medical records
8
emr data
8
generating complete
8
complete synthetic
8
background records
8

Similar Publications

Rupture prediction is crucial for precise treatment and follow-up management of patients with intracranial aneurysms (IAs). Considerable machine learning (ML) methods have been proposed to improve rupture prediction by leveraging electronic medical records (EMRs), however, data scarcity and category imbalance strongly influence performance. Thus, we propose a novel data synthesis method i.

View Article and Find Full Text PDF

Objective: Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMRs) for personalized precision management of chronic disease progression.

Materials And Methods: We first perform requirement analysis with clinicians and data scientists to determine the visual analytics tasks of the TrajVis system as well as its design and functionalities. A graph AI model for chronic kidney disease (CKD) trajectory inference named DisEase PrOgression Trajectory (DEPOT) is used for system development and demonstration.

View Article and Find Full Text PDF

To evaluate the comparability of a probable clinical trial (CT) cohort derived from electronic medical records (EMR) data with a real-world cohort treated with the same therapy and identified using the same inclusion and exclusion criteria to emulate an external control. We utilized de-identified patient-level structured data sourced from EMRs. We then compared patterns of overall survival (OS) between probable CT patients with those drawn from non-contemporaneous real-world data (RWD) using a two-sided log-rank test, hazard ratios (HRs) using a Cox proportional-hazards model and Kaplan-Meier (KM) survival curves.

View Article and Find Full Text PDF

Background: Extensor mechanism disruption (EMD) following total knee arthroplasty (TKA) is a devastating problem commonly treated with allograft or synthetic reconstruction. Understanding of reconstruction success rates and patient recorded outcomes is lacking.

Methods: Patients who have an EMD after TKA undergoing mesh or whole-extensor allograft reconstruction between 2011 and 2019, with minimum 2-year follow-up were reviewed at two tertiary care centers.

View Article and Find Full Text PDF

Machine learning (ML) and Natural Language Processing (NLP) have achieved remarkable success in many fields and have brought new opportunities and high expectation in the analyses of medical data, of which the most common type is the massive free-text electronic medical records (EMR). However, the free EMR texts are lacking consistent standards, rich of private information, and limited in availability. Also, it is often hard to have a balanced number of samples for the types of diseases under study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!