Machine Learning Methods to Extract Documentation of Breast Cancer Symptoms From Electronic Health Records.

J Pain Symptom Manage

Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts; Division of Palliative Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts. Electronic address:

Published: June 2018

Context: Clinicians document cancer patients' symptoms in free-text format within electronic health record visit notes. Although symptoms are critically important to quality of life and often herald clinical status changes, computational methods to assess the trajectory of symptoms over time are woefully underdeveloped.

Objectives: To create machine learning algorithms capable of extracting patient-reported symptoms from free-text electronic health record notes.

Methods: The data set included 103,564 sentences obtained from the electronic clinical notes of 2695 breast cancer patients receiving paclitaxel-containing chemotherapy at two academic cancer centers between May 1996 and May 2015. We manually annotated 10,000 sentences and trained a conditional random field model to predict words indicating an active symptom (positive label), absence of a symptom (negative label), or no symptom at all (neutral label). Sentences labeled by human coder were divided into training, validation, and test data sets. Final model performance was determined on 20% test data unused in model development or tuning.

Results: The final model achieved precision of 0.82, 0.86, and 0.99 and recall of 0.56, 0.69, and 1.00 for positive, negative, and neutral symptom labels, respectively. The most common positive symptoms were pain, fatigue, and nausea. Machine-based labeling of 103,564 sentences took two minutes.

Conclusion: We demonstrate the potential of machine learning to gather, track, and analyze symptoms experienced by cancer patients during chemotherapy. Although our initial model requires further optimization to improve the performance, further model building may yield machine learning methods suitable to be deployed in routine clinical care, quality improvement, and research applications.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jpainsymman.2018.02.016DOI Listing

Publication Analysis

Top Keywords

machine learning
16
electronic health
12
learning methods
8
breast cancer
8
symptoms free-text
8
health record
8
103564 sentences
8
cancer patients
8
test data
8
final model
8

Similar Publications

Objective: This study evaluates the utility of word embeddings, generated by large language models (LLMs), for medical diagnosis by comparing the semantic proximity of symptoms to their eponymic disease embedding ("eponymic condition") and the mean of all symptom embeddings associated with a disease ("ensemble mean").

Materials And Methods: Symptom data for 5 diagnostically challenging pediatric diseases-CHARGE syndrome, Cowden disease, POEMS syndrome, Rheumatic fever, and Tuberous sclerosis-were collected from PubMed. Using the Ada-002 embedding model, disease names and symptoms were translated into vector representations in a high-dimensional space.

View Article and Find Full Text PDF

Transcription factor prediction using protein 3D secondary structures.

Bioinformatics

January 2025

Institute for Computational Systems Biology, Universität Hamburg, Hamburg, 22761, Germany.

Motivation: Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs.

View Article and Find Full Text PDF

Background: Postoperative delirium (POD) is a common complication after major surgery and is associated with poor outcomes in older adults. Early identification of patients at high risk of POD can enable targeted prevention efforts. However, existing POD prediction models require inpatient data collected during the hospital stay, which delays predictions and limits scalability.

View Article and Find Full Text PDF

Importance: Recently, the US Food and Drug Administration gave premarketing approval to an algorithm based on its purported ability to identify individuals at genetic risk for opioid use disorder (OUD). However, the clinical utility of the candidate genetic variants included in the algorithm has not been independently demonstrated.

Objective: To assess the utility of 15 genetic variants from an algorithm intended to predict OUD risk.

View Article and Find Full Text PDF

Purpose: To extract conjunctival bulbar redness from standardized high-resolution ocular surface photographs of a novel imaging system by implementing an image analysis pipeline.

Methods: Data from two trials (healthy; outgoing ophthalmic clinic) were collected, processed, and used to train a machine learning model for ocular surface segmentation. Various regions of interest were defined to globally and locally extract a redness biomarker based on color intensity.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!