J Am Med Inform Assoc
August 2024
Objective: To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data.
Materials And Methods: Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process.
Results: We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation.
We sought to determine whether machine learning and natural language processing (NLP) applied to electronic medical records could improve performance of automated health-care claims-based algorithms to identify anaphylaxis events using data on 516 patients with outpatient, emergency department, or inpatient anaphylaxis diagnosis codes during 2015-2019 in 2 integrated health-care institutions in the Northwest United States. We used one site's manually reviewed gold-standard outcomes data for model development and the other's for external validation based on cross-validated area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), and sensitivity. In the development site 154 (64%) of 239 potential events met adjudication criteria for anaphylaxis compared with 180 (65%) of 277 in the validation site.
View Article and Find Full Text PDFBackground: Acute pancreatitis is a serious gastrointestinal disease that is an important target for drug safety surveillance. Little is known about the accuracy of ICD-10 codes for acute pancreatitis in the United States, or their performance in specific clinical settings. We conducted a validation study to assess the accuracy of acute pancreatitis ICD-10 diagnosis codes in inpatient, emergency department (ED), and outpatient settings.
View Article and Find Full Text PDFJ Drug Assess
April 2020
Opioid surveillance in response to the opioid epidemic will benefit from scalable, automated algorithms for identifying patients with clinically documented signs of problem prescription opioid use. Existing algorithms lack accuracy. We sought to develop a high-sensitivity, high-specificity classification algorithm based on widely available structured health data to identify patients receiving chronic extended-release/long-acting (ER/LA) therapy with evidence of problem use to support subsequent epidemiologic investigations.
View Article and Find Full Text PDFPharmacoepidemiol Drug Saf
August 2019
Purpose: To enhance automated methods for accurately identifying opioid-related overdoses and classifying types of overdose using electronic health record (EHR) databases.
Methods: We developed a natural language processing (NLP) software application to code clinical text documentation of overdose, including identification of intention for self-harm, substances involved, substance abuse, and error in medication usage. Using datasets balanced with cases of suspected overdose and records of individuals at elevated risk for overdose, we developed and validated the application using Kaiser Permanente Northwest data, then tested portability of the application using Kaiser Permanente Washington data.
Purpose: To facilitate surveillance and evaluate interventions addressing opioid-related overdoses, algorithms are needed for use in large health care databases to identify and differentiate community-occurring opioid-related overdoses from inpatient-occurring opioid-related overdose/oversedation.
Methods: Data were from Kaiser Permanente Northwest (KPNW), a large integrated health plan. We iteratively developed and evaluated an algorithm for electronically identifying inpatient overdose/oversedation in KPNW hospitals from 1 January 2008 to 31 December 2014.
Pharmacoepidemiol Drug Saf
August 2019
Purpose: The study aims to develop and validate algorithms to identify and classify opioid overdoses using claims and other coded data, and clinical text extracted from electronic health records using natural language processing (NLP).
Methods: Primary data were derived from Kaiser Permanente Northwest (2008-2014), an integrated health care system (~n > 475 000 unique individuals per year). Data included International Classification of Diseases, Ninth Revision (ICD-9) codes for nonfatal diagnoses, International Classification of Diseases, Tenth Revision (ICD-10) codes for fatal events, clinical notes, and prescription medication records.
Introduction: Brief smoking-cessation interventions in primary care settings are effective, but delivery of these services remains low. The Centers for Medicare and Medicaid Services' Meaningful Use (MU) of Electronic Health Record (EHR) Incentive Program could increase rates of smoking assessment and cessation assistance among vulnerable populations. This study examined whether smoking status assessment, cessation assistance, and odds of being a current smoker changed after Stage 1 MU implementation.
View Article and Find Full Text PDFBackground: Numerous population-based surveys indicate that overweight and obese patients can benefit from lifestyle counseling during routine clinical care.
Purpose: To determine if natural language processing (NLP) could be applied to information in the electronic health record (EHR) to automatically assess delivery of weight management-related counseling in clinical healthcare encounters.
Methods: The MediClass system with NLP capabilities was used to identify weight-management counseling in EHRs.
Comparative effectiveness research (CER) has the potential to transform the current health care delivery system by identifying the most effective medical and surgical treatments, diagnostic tests, disease prevention methods, and ways to deliver care for specific clinical conditions. To be successful, such research requires the identification, capture, aggregation, integration, and analysis of disparate data sources held by different institutions with diverse representations of the relevant clinical events. In an effort to address these diverse demands, there have been multiple new designs and implementations of informatics platforms that provide access to electronic clinical data and the governance infrastructure required for interinstitutional CER.
View Article and Find Full Text PDFIn an experiment to investigate cognitive skill differences between clinicians and lay persons, eight individuals in each group were asked to determine if an explicit concept existed in an ambulatory encounter note (a simple task) or if the concept could be inferred from the same note (a complex task). Subjects answered questions, highlighted text used to answer each question, and commented on their reasoning for selecting specific text. Quantitative results were mixed for expert vs.
View Article and Find Full Text PDFThe Vaccine Safety Datalink (VSD) is a collaboration between the CDC and eight large HMOs to investigate adverse events following immunization through analyses of clinical data. We modified an existing system, called MediClass, that uses natural language processing to identify clinical events recorded in electronic medical records (EMRs). We customized MediClass so it could detect possible vaccine adverse events (VAEs) generally, and gastrointestinal-related VAEs in particular, in the text clinical notes of encounters recorded in the EMR of a large HMO.
View Article and Find Full Text PDFElectronic medical records (EMRs) hold the promise of making routine comprehensive measurement of care quality a reality. However, there are many informatics challenges that stand in the way of this goal. Guidelines are rarely stated in precise enough language for automated measurement of clinical practices and the data necessary for that measurement often reside in the text notes of EMRs.
View Article and Find Full Text PDFObjective: We estimated the quality of life impact of vision loss in a community-based population with diabetes.
Design And Methods: We randomly surveyed 4,000 members of a large health maintenance organization with type 2 diabetes to assess quality of life using the EQ-5D instrument. Visual acuity was obtained by automated text processing of clinical notes recorded during the two years preceding subjects' surveys.
Background: Medical informatics has been guided by an individual-centered model of human cognition, inherited from classical theory of mind, in which knowledge, problem-solving, and information-processing responsible for intelligent behavior all derive from the inner workings of an individual agent.
Objectives And Results: In this paper we argue that medical informatics commitment to the classical model of cognition conflates the processing performed by the minds of individual agents with the processing performed by the larger distributed activity systems within which individuals operate. We review trends in cognitive science that seek to close the gap between general-purpose models of cognition and applied considerations of real-world human performance.