Combining structured and unstructured data in EMRs to create clinically-defined EMR-derived cohorts.

Charmaine S Tam Janice Gullick Aldo Saavedra Stephen T Vernon Gemma A Figtree Clara K Chow Michelle Cretikos Richard W Morris Maged William Jonathan Morris David Brieger

BMC Med Inform Decis Mak

Department of Cardiology, Concord Hospital, Sydney, Australia.

Published: March 2021

Background: There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs.

Methods: Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs.

Results: Among 802,742 encounters in a 5 year dataset (1/1/13-30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4-64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years.

Conclusions: Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7938556	PMC
http://dx.doi.org/10.1186/s12911-021-01441-w	DOI Listing

Publication Analysis

Top Keywords

structured unstructured

unstructured data

clinically-defined emr-derived

combining structured

emr-derived cohorts

production emr

inclusion criteria

orders investigations

clinical documentation

data

Similar Publications

Geographic variation in delay to surgical treatment among non-small cell lung cancer patients.

Lung Cancer

January 2025

School of Public Health and Preventive Medicine, Monash University, Melbourne, Australia. Electronic address:

Getayeneh Antehunegn Tesema Rob G Stirling Win Wah Zemenu Tadesse Tessema Stephane Heritier

Article Synopsis

Delayed surgeries significantly increase the risk of disease progression and negative outcomes in lung cancer patients, particularly those with Non-Small Cell Lung Cancer (NSCLC).
The study analyzed data from 3,088 NSCLC patients, revealing that over 40% experienced delays in surgical treatment due to geographic variability and various risk factors.
Key factors contributing to these delays included advanced cancer stages, treatment at specific regional hospitals, existing health conditions, and diagnoses made during the COVID-19 pandemic.

View Article and Find Full Text PDF

Similar Publications

Multimodal deep learning for predicting in-hospital mortality in heart failure patients using longitudinal chest X-rays and electronic health records.

Int J Cardiovasc Imaging

January 2025

Shanxi Cardiovascular Hospital, 18 Yifen Street, Taiyuan, 030024, Shanxi, China.

Dengao Li Wen Xing Jumin Zhao Changcheng Shi Fei Wang

Amid an aging global population, heart failure has become a leading cause of hospitalization among older people. Its high prevalence and mortality rates underscore the importance of accurate mortality prediction for swift disease progression assessment and better patient outcomes. The evolution of artificial intelligence (AI) presents new avenues for predicting heart failure mortality.

View Article and Find Full Text PDF

Similar Publications

Identification of Naloxone in Emergency Medical Services Data Substantially Improves by Processing Unstructured Patient Care Narratives.

Prehosp Emerg Care

January 2025

Institute for Pharmaceutical Outcomes & Policy, Department of Pharmacy Practice and Science, College of Pharmacy, University of Kentucky, Lexington KY 40508, USA.

Daniel R Harris Peter Rock Nicholas Anthony Dana Quesinberry Chris Delcher

Objectives: Structured data fields, including medication fields involving naloxone, are routinely used to identify opioid overdoses in emergency medical services (EMS) data; between January 2021 and March 2024, there were approximately 1.2 million instances of naloxone administration. in the United States.

View Article and Find Full Text PDF

Similar Publications

Leveraging Natural Language Processing and Machine Learning Methods for Adverse Drug Event Detection in Electronic Health/Medical Records: A Scoping Review.

Drug Saf

January 2025

Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Su Golder Dongfang Xu Karen O'Connor Yunwen Wang Mahak Batra

Background: Natural language processing (NLP) and machine learning (ML) techniques may help harness unstructured free-text electronic health record (EHR) data to detect adverse drug events (ADEs) and thus improve pharmacovigilance. However, evidence of their real-world effectiveness remains unclear.

Objective: To summarise the evidence on the effectiveness of NLP/ML in detecting ADEs from unstructured EHR data and ultimately improve pharmacovigilance in comparison to other data sources.

View Article and Find Full Text PDF

Similar Publications

Hybrid natural language processing tool for semantic annotation of medical texts in Spanish.

BMC Bioinformatics

January 2025

Centro de Salud Retiro, Hospital Universitario Gregorio Marañon, C/Lope de Rueda, 43, 28009, Madrid, Spain.

Leonardo Campillos-Llanos Ana Valverde-Mateos Adrián Capllonch-Carrión

Background: Natural language processing (NLP) enables the extraction of information embedded within unstructured texts, such as clinical case reports and trial eligibility criteria. By identifying relevant medical concepts, NLP facilitates the generation of structured and actionable data, supporting complex tasks like cohort identification and the analysis of clinical records. To accomplish those tasks, we introduce a deep learning-based and lexicon-based named entity recognition (NER) tool for texts in Spanish.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!