J Am Med Inform Assoc
April 2024
Objective: Large language models (LLMs) have shown impressive ability in biomedical question-answering, but have not been adequately investigated for more specific biomedical applications. This study investigates ChatGPT family of models (GPT-3.5, GPT-4) in biomedical tasks beyond question-answering.
View Article and Find Full Text PDFSocial determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information.
View Article and Find Full Text PDFPurpose: Radiotherapy (RT) toxicities can impair survival and quality of life, yet remain understudied. Real-world evidence holds potential to improve our understanding of toxicities, but toxicity information is often only in clinical notes. We developed natural language processing (NLP) models to identify the presence and severity of esophagitis from notes of patients treated with thoracic RT.
View Article and Find Full Text PDFPurpose: There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities.
Methods: This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019.
Purpose: Real-world evidence for radiation therapy (RT) is limited because it is often documented only in the clinical narrative. We developed a natural language processing system for automated extraction of detailed RT events from text to support clinical phenotyping.
Methods And Materials: A multi-institutional data set of 96 clinician notes, 129 North American Association of Central Cancer Registries cancer abstracts, and 270 RT prescriptions from HemOnc.
Int J Radiat Oncol Biol Phys
July 2021
Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes.
View Article and Find Full Text PDFObjective: To advance use of real-world data (RWD) for pharmacovigilance, we sought to integrate a high-sensitivity natural language processing (NLP) pipeline for detecting potential adverse drug events (ADEs) with easily interpretable output for high-efficiency human review and adjudication of true ADEs.
Materials And Methods: The adverse drug event presentation and tracking (ADEPT) system employs an open source NLP pipeline to identify in clinical notes mentions of medications and signs and symptoms potentially indicative of ADEs. ADEPT presents the output to human reviewers by highlighting these drug-event pairs within the context of the clinical note.
Objective: Real-world data (RWD) are increasingly used for pharmacoepidemiology and regulatory innovation. Our objective was to compare adverse drug event (ADE) rates determined from two RWD sources, electronic health records and administrative claims data, among children treated with drugs for pulmonary hypertension.
Materials And Methods: Textual mentions of medications and signs/symptoms that may represent ADEs were identified in clinical notes using natural language processing.
Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities.
View Article and Find Full Text PDFClinical narratives are a valuable source of information for both patient care and biomedical research. Given the unstructured nature of medical reports, specific automatic techniques are required to extract relevant entities from such texts. In the natural language processing (NLP) community, this task is often addressed by using supervised methods.
View Article and Find Full Text PDFObjective: Comparison of readmission rates requires adjustment for case-mix (ie, differences in patient populations), but previously only claims data were available for this purpose. We examined whether incorporation of relatively readily available clinical data improves prediction of pediatric readmissions and thus might enhance case-mix adjustment.
Methods: We examined 30-day readmissions using claims and electronic health record data for patients ≤18 years and 29 days of age who were admitted to 3 children's hospitals from February 2011 to February 2014.
Precise phenotype information is needed to understand the effects of genetic and epigenetic changes on tumor behavior and responsiveness. Extraction and representation of cancer phenotypes is currently mostly performed manually, making it difficult to correlate phenotypic data to genomic data. In addition, genomic data are being produced at an increasingly faster pace, exacerbating the problem.
View Article and Find Full Text PDFThe prevalence of severe obesity in children has doubled in the past decade. The objective of this study is to identify the clinical documentation of obesity in young children with a BMI ≥ 99th percentile at two large tertiary care pediatric hospitals. We used a standardized algorithm utilizing data from electronic health records to identify children with severe early onset obesity (BMI ≥ 99th percentile at age <6 years).
View Article and Find Full Text PDFObjective: To develop an open-source temporal relation discovery system for the clinical domain. The system is capable of automatically inferring temporal relations between events and time expressions using a multilayered modeling strategy. It can operate at different levels of granularity--from rough temporality expressed as event relations to the document creation time (DCT) to temporal containment to fine-grained classic Allen-style relations.
View Article and Find Full Text PDFBackground: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations.
View Article and Find Full Text PDFElectronic medical records are emerging as a major source of data for clinical and translational research studies, although phenotypes of interest need to be accurately defined first. This article provides an overview of how to develop a phenotype algorithm from electronic medical records, incorporating modern informatics and biostatistics methods.
View Article and Find Full Text PDFSupervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks.
View Article and Find Full Text PDFObjectives: To improve the accuracy of mining structured and unstructured components of the electronic medical record (EMR) by adding temporal features to automatically identify patients with rheumatoid arthritis (RA) with methotrexate-induced liver transaminase abnormalities.
Materials And Methods: Codified information and a string-matching algorithm were applied to a RA cohort of 5903 patients from Partners HealthCare to select 1130 patients with potential liver toxicity. Supervised machine learning was applied as our key method.
Natural language processing (NLP) technologies provide an opportunity to extract key patient data from free text documents within the electronic health record (EHR). We are developing a series of components from which to construct NLP pipelines. These pipelines typically begin with a component whose goal is to label sections within medical documents with codes indicating the anticipated semantics of their content.
View Article and Find Full Text PDFObjective: To optimally leverage the scalability and unique features of the electronic health records (EHR) for research that would ultimately improve patient care, we need to accurately identify patients and extract clinically meaningful measures. Using multiple sclerosis (MS) as a proof of principle, we showcased how to leverage routinely collected EHR data to identify patients with a complex neurological disorder and derive an important surrogate measure of disease severity heretofore only available in research settings.
Methods: In a cross-sectional observational study, 5,495 MS patients were identified from the EHR systems of two major referral hospitals using an algorithm that includes codified and narrative information extracted using natural language processing.
Research Objective: To develop scalable informatics infrastructure for normalization of both structured and unstructured electronic health record (EHR) data into a unified, concept-based model for high-throughput phenotype extraction.
Materials And Methods: Software tools and applications were developed to extract information from EHRs. Representative and convenience samples of both structured and unstructured data from two EHR systems-Mayo Clinic and Intermountain Healthcare-were used for development and validation.