Evaluating Expert-Layperson Agreement in Identifying Jargon Terms in Electronic Health Record Notes: Observational Study.

J Med Internet Res

Center for Biomedical and Health Research in Data Sciences, Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, MA, United States.

Published: October 2024

AI Article Synopsis

  • Studies indicate that patients, especially those with low health literacy, struggle to understand medical terms in electronic health records (EHR), prompting the creation of the NoteAid dictionary to define these terms for better patient comprehension.
  • The study aimed to see if medical experts and everyday people (laypeople) agree on what counts as medical jargon, using a comparison of their identifications in EHR notes from participants recruited through Amazon Mechanical Turk.
  • Results showed that medical experts identified 59% of terms as jargon, while laypeople identified only 25.6%, with good agreement among experts and fair agreement among laypeople regarding jargon classification.

Article Abstract

Background: Studies have shown that patients have difficulty understanding medical jargon in electronic health record (EHR) notes, particularly patients with low health literacy. In creating the NoteAid dictionary of medical jargon for patients, a panel of medical experts selected terms they perceived as needing definitions for patients.

Objective: This study aims to determine whether experts and laypeople agree on what constitutes medical jargon.

Methods: Using an observational study design, we compared the ability of medical experts and laypeople to identify medical jargon in EHR notes. The laypeople were recruited from Amazon Mechanical Turk. Participants were shown 20 sentences from EHR notes, which contained 325 potential jargon terms as identified by the medical experts. We collected demographic information about the laypeople's age, sex, race or ethnicity, education, native language, and health literacy. Health literacy was measured with the Single Item Literacy Screener. Our evaluation metrics were the proportion of terms rated as jargon, sensitivity, specificity, Fleiss κ for agreement among medical experts and among laypeople, and the Kendall rank correlation statistic between the medical experts and laypeople. We performed subgroup analyses by layperson characteristics. We fit a beta regression model with a logit link to examine the association between layperson characteristics and whether a term was classified as jargon.

Results: The average proportion of terms identified as jargon by the medical experts was 59% (1150/1950, 95% CI 56.1%-61.8%), and the average proportion of terms identified as jargon by the laypeople overall was 25.6% (22,480/87,750, 95% CI 25%-26.2%). There was good agreement among medical experts (Fleiss κ=0.781, 95% CI 0.753-0.809) and fair agreement among laypeople (Fleiss κ=0.590, 95% CI 0.589-0.591). The beta regression model had a pseudo-R of 0.071, indicating that demographic characteristics explained very little of the variability in the proportion of terms identified as jargon by laypeople. Using laypeople's identification of jargon as the gold standard, the medical experts had high sensitivity (91.7%, 95% CI 90.1%-93.3%) and specificity (88.2%, 95% CI 86%-90.5%) in identifying jargon terms.

Conclusions: To ensure coverage of possible jargon terms, the medical experts were loose in selecting terms for inclusion. Fair agreement among laypersons shows that this is needed, as there is a variety of opinions among laypersons about what is considered jargon. We showed that medical experts could accurately identify jargon terms for annotation that would be useful for laypeople.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522659PMC
http://dx.doi.org/10.2196/49704DOI Listing

Publication Analysis

Top Keywords

medical experts
40
jargon terms
16
experts laypeople
16
terms identified
16
proportion terms
16
jargon
14
medical
14
medical jargon
12
ehr notes
12
health literacy
12

Similar Publications

Background: Expanding access to equitable health insurance is an important lever towards the overall strategy for achieving universal health coverage. In Nigeria, health insurance coverage is low with a renewed government action on increasing access to and coverage of high-quality healthcare services to citizens, particularly for the vulnerable and poor population. Therefore, our study co-creates the priorities for expanding health insurance in Nigeria, focusing on key policy reforms, public advocacy, and innovative financing strategies to ensure broader and more equitable coverage for the population.

View Article and Find Full Text PDF

Background: The COVID-19 pandemic entailed a global health crisis, significantly affecting medical service delivery in Germany as well as elsewhere. While intensive care capacities were overloaded by COVID cases, not only elective cases but also non-COVID cases requiring urgent treatment unexpectedly decreased, potentially leading to a deterioration in health outcomes. However, these developments were only uncovered retrospectively.

View Article and Find Full Text PDF

The TRIPOD-LLM reporting guideline for studies using large language models.

Nat Med

January 2025

Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.

Large language models (LLMs) are rapidly being adopted in healthcare, necessitating standardized reporting guidelines. We present transparent reporting of a multivariable model for individual prognosis or diagnosis (TRIPOD)-LLM, an extension of the TRIPOD + artificial intelligence statement, addressing the unique challenges of LLMs in biomedical applications. TRIPOD-LLM provides a comprehensive checklist of 19 main items and 50 subitems, covering key aspects from title to discussion.

View Article and Find Full Text PDF

Large language models (LLMs) have shown promise in medical question answering, with Med-PaLM being the first to exceed a 'passing' score in United States Medical Licensing Examination style questions. However, challenges remain in long-form medical question answering and handling real-world workflows. Here, we present Med-PaLM 2, which bridges these gaps with a combination of base LLM improvements, medical domain fine-tuning and new strategies for improving reasoning and grounding through ensemble refinement and chain of retrieval.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!