Background: Natural language processing (NLP) of unstructured text from electronic medical records (EMR) can improve the characterization of COVID-19 signs and symptoms, but large-scale studies demonstrating the real-world application and validation of NLP for this purpose are limited.

Objective: The aim of this paper is to assess the contribution of NLP when identifying COVID-19 signs and symptoms from EMR.

Methods: This study was conducted in Kaiser Permanente Southern California, a large integrated health care system using data from all patients with positive SARS-CoV-2 laboratory tests from March 2020 to May 2021. An NLP algorithm was developed to extract free text from EMR on 12 established signs and symptoms of COVID-19, including fever, cough, headache, fatigue, dyspnea, chills, sore throat, myalgia, anosmia, diarrhea, vomiting or nausea, and abdominal pain. The proportion of patients reporting each symptom and the corresponding onset dates were described before and after supplementing structured EMR data with NLP-extracted signs and symptoms. A random sample of 100 chart-reviewed and adjudicated SARS-CoV-2-positive cases were used to validate the algorithm performance.

Results: A total of 359,938 patients (mean age 40.4 [SD 19.2] years; 191,630/359,938, 53% female) with confirmed SARS-CoV-2 infection were identified over the study period. The most common signs and symptoms identified through NLP-supplemented analyses were cough (220,631/359,938, 61%), fever (185,618/359,938, 52%), myalgia (153,042/359,938, 43%), and headache (144,705/359,938, 40%). The NLP algorithm identified an additional 55,568 (15%) symptomatic cases that were previously defined as asymptomatic using structured data alone. The proportion of additional cases with each selected symptom identified in NLP-supplemented analysis varied across the selected symptoms, from 29% (63,742/220,631) of all records for cough to 64% (38,884/60,865) of all records with nausea or vomiting. Of the 295,305 symptomatic patients, the median time from symptom onset to testing was 3 days using structured data alone, whereas the NLP algorithm identified signs or symptoms approximately 1 day earlier. When validated against chart-reviewed cases, the NLP algorithm successfully identified signs and symptoms with consistently high sensitivity (ranging from 87% to 100%) and specificity (94% to 100%).

Conclusions: These findings demonstrate that NLP can identify and characterize a broad set of COVID-19 signs and symptoms from unstructured EMR data with enhanced detail and timeliness compared with structured data alone.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9822566PMC
http://dx.doi.org/10.2196/41529DOI Listing

Publication Analysis

Top Keywords

signs symptoms
32
nlp algorithm
16
covid-19 signs
12
algorithm identified
12
structured data
12
symptoms
10
natural language
8
language processing
8
characterization covid-19
8
large integrated
8

Similar Publications

Heterogeneity in Fluorescence-Stained Sperm Membrane Patterns and Their Dynamic Changes Towards Fertilization in Mice.

Front Biosci (Landmark Ed)

January 2025

Graduate School of Life and Environmental Sciences, Integrated Graduate School of Medicine, Engineering, and Agricultural Sciences, University of Yamanashi, 400-8510 Kofu, Japan.

Background: Sperm represent a heterogeneous population crucial for male reproductive success. Additionally, sperm undergo dynamic changes during maturation and capacitation. Despite these well-established processes, the complex nature of sperm heterogeneity and membrane dynamics remains elusive.

View Article and Find Full Text PDF

Background: This study investigates the role of small ubiquitin-like modifier (SUMO)-specific peptidase 5 (SENP5), a key regulator of SUMOylation, in esophageal squamous cell carcinoma (ESCC), a lethal disease, and its underlying molecular mechanisms.

Methods: Differentially expressed genes between ESCC mouse oesophageal cancer tissues and normal tissues were analysed via RNA-seq; among them, SENP5 expression was upregulated, and this gene was selected for further analysis. Immunohistochemistry and western blotting were then used to validate the increased protein level of SENP5 in both mouse and human ESCC samples.

View Article and Find Full Text PDF

tiRNA-Gln-CTG is Involved in the Regulation of Trophoblast Cell Function in Pre-eclampsia and Serves as a Potent Biomarker.

Front Biosci (Landmark Ed)

January 2025

Department of Obstetrics and Gynecology, Zhongda Hospital, School of Medicine, Southeast University, 210000 Nanjing, Jiangsu, China.

Background: Pre-eclampsia (PE) is a gestational disorder that significantly endangers maternal and fetal health. Transfer ribonucleic acid (tRNA)-derived small RNAs (tsRNAs) are important in the progression and diagnosis of various diseases. However, their role in the development of PE is unclear.

View Article and Find Full Text PDF

The Formation and Features of Massive Vacuole Induced by Nutrient Deficiency in Human Embryonic Kidney Cells.

Front Biosci (Landmark Ed)

January 2025

Department of Cardiovascular Medicine, Binzhou Medical University Hospital, 256603 Binzhou, Shandong, China.

Background: Cellular vacuolization is a commonly observed phenomenon under physiological and pathological conditions. However, the mechanisms underlying vacuole formation remain largely unresolved.

Methods: LysoTracker Deep Red probes and Enhanced Green Fluorescent Protein-tagged light chain 3B (LC3B) plasmids were employed to differentiate the types of massive vacuoles.

View Article and Find Full Text PDF

Hydroxyapatite Chitosan Gradient Pore Scaffold Activates Oxidative Phosphorylation Pathway to Induce Bone Formation.

Front Biosci (Landmark Ed)

January 2025

Department of Oral and Maxillofacial Surgery, The First Affiliated Hospital of Fujian Medical University, Fujian Provincial Key Laboratory of Stomatology, National Regional Medical Center, Binhai Campus of The First Affiliated Hospital, 350005 Fuzhou, Fujian, China.

Background: In this study, we prepared a porous gradient scaffold with hydroxyapatite microtubules (HAMT) and chitosan (CHS) and investigated osteogenesis induced by these scaffolds.

Methods: The arrangement of wax balls in the mold can control the size and distribution of the pores of the scaffold, and form an interconnected gradient pore structure. The scaffolds were systematically evaluated and for biocompatibility, biological activity, and regulatory mechanisms.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!