Assessing the Ability of a Large Language Model to Score Free-Text Medical Student Clinical Notes: Quantitative Study.

Harry B Burke Albert Hoang Joseph O Lopreiato Heidi King Paul Hemmer Michael Montgomery Viktoria Gagarin

JMIR Med Educ

Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, United States, 1 301-938-2212.

Published: July 2024

Background: Teaching medical students the skills required to acquire, interpret, apply, and communicate clinical information is an integral part of medical education. A crucial aspect of this process involves providing students with feedback regarding the quality of their free-text clinical notes.

Objective: The goal of this study was to assess the ability of ChatGPT 3.5, a large language model, to score medical students' free-text history and physical notes.

Methods: This is a single-institution, retrospective study. Standardized patients learned a prespecified clinical case and, acting as the patient, interacted with medical students. Each student wrote a free-text history and physical note of their interaction. The students' notes were scored independently by the standardized patients and ChatGPT using a prespecified scoring rubric that consisted of 85 case elements. The measure of accuracy was percent correct.

Results: The study population consisted of 168 first-year medical students. There was a total of 14,280 scores. The ChatGPT incorrect scoring rate was 1.0%, and the standardized patient incorrect scoring rate was 7.2%. The ChatGPT error rate was 86%, lower than the standardized patient error rate. The ChatGPT mean incorrect scoring rate of 12 (SD 11) was significantly lower than the standardized patient mean incorrect scoring rate of 85 (SD 74; P=.002).

Conclusions: ChatGPT demonstrated a significantly lower error rate compared to standardized patients. This is the first study to assess the ability of a generative pretrained transformer (GPT) program to score medical students' standardized patient-based free-text clinical notes. It is expected that, in the near future, large language models will provide real-time feedback to practicing physicians regarding their free-text notes. GPT artificial intelligence programs represent an important advance in medical education and medical practice.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11327632	PMC
http://dx.doi.org/10.2196/56342	DOI Listing

Publication Analysis

Top Keywords

incorrect scoring

scoring rate

large language

medical students

standardized patients

standardized patient

error rate

medical

language model

model score

Similar Publications

Diabe-teach: a randomized controlled trial of a gamified approach to enhance medical undergraduates' knowledge and comprehension of diabetes mellitus.

BMC Med Educ

January 2025

Dubai Medical College for Girls, Dubai, UAE.

Mariam Shadan Heba Ismail Fathima Hana Mohamed Naushad

The increasing prevalence of diabetes mellitus worldwide necessitates that medical undergraduates acquire a deep understanding of the disease to ensure accurate diagnosis and effective management. Traditional teaching methods, while foundational, often lack the interactive elements that enhance student engagement and knowledge retention. This study aimed to evaluate the effectiveness of a novel educational board game, "Diabe-teach," in enhancing knowledge retention among medical students compared with conventional self-study methods.

View Article and Find Full Text PDF

Similar Publications

Sources of HIV information and women's HIV knowledge in Southwest Sumba Indonesia: a cross-sectional study with mediation analysis.

BMC Public Health

January 2025

Public Policy, Management, and Analytics, College of Urban Planning and Public Affairs, University of Illinois at Chicago, Chicago, IL, 60607, USA.

Angela Kurniadi Judith A Levy Timothy P Johnson

Background: Despite multiple years of government HIV educational efforts, the growing trend of new cases among women in Indonesia runs parallel with their seemingly overall lack of comprehensive knowledge about HIV. A major prevention challenge for the Indonesian government lies in delivering HIV prevention education across the world's largest archipelago. This study investigates comprehensive HIV knowledge among reproductive-age women in Southwest Sumba, Indonesia, and the sources through which they report having learned about HIV along with potential mediators of the relationship between socioeconomic status (SES) and HIV knowledge.

View Article and Find Full Text PDF

Similar Publications

Systematic review and meta-analysis comparative analysis of the safety and efficacy of fenestrated pedicle screw with cement and conventional pedicle screw with cement in the treatment of osteoporotic vertebral fractures: A meta-analysis.

Chin J Traumatol

December 2024

Department of Orthopaedics, Xinhua Hospital of Zhejiang Province, The Second Affiliated Hospital of Zhejiang Chinese Medical University, Hangzhou, 310003, China.

Li Cao Hong-Jie Xu Yi-Kang Yu Huan-Huan Tang Bo-Hao Fang

Purpose: Bone cement-reinforced fenestrated pedicle screws (FPSs) have been widely used in the internal fixation and repair of the spine with osteoporosis in recent years and show significant improvement in fixation strength and stability. However, compared with conventional reinforcement methods, the advantages of bone cement-reinforced FPSs remain undetermined. This article compares the effects of fenestrated and conventional pedicle screws (CPSs) combined with bone cement in the treatment of osteoporosis.

View Article and Find Full Text PDF

Similar Publications

Understanding depression and the PHQ-9 items among people living with HIV: A multiple methods qualitative study in Yaoundé, Cameroon.

SSM Ment Health

December 2024

Division of General Internal Medicine, Department of Medicine, Albert Einstein College of Medicine, Bronx, NY, USA.

Natalia Zotova Dana Watnick Rogers Awoh Ajeh Elodie Flore Tchiengang Moungang Julie Laure Nguemo Noumedem

People living with HIV (PLWH) are disproportionately affected by depression, which often remains underdiagnosed and untreated, negatively impacting quality of life and treatment outcomes. Low resource settings often lack clinical professionals to identify depression, therefore screening tools such as the PHQ-9 allow for broader depression screening. This qualitative study among PLWH in Yaoundé Cameroon aimed to a) explore local understandings of depression and mental distress and b) assess comprehension and interpretation of the PHQ-9 items and response categories.

View Article and Find Full Text PDF

Similar Publications

Utilizing domain knowledge to improve the classification of intravenous contrast phase of CT scans.

Comput Med Imaging Graph

November 2024

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America. Electronic address:

Liangchen Liu Jianfei Liu Bikash Santra Christopher Parnell Pritam Mukherjee

Multiple intravenous contrast phases of CT scans are commonly used in clinical practice to facilitate disease diagnosis. However, contrast phase information is commonly missing or incorrect due to discrepancies in CT series descriptions and imaging practices. This work aims to develop a classification algorithm to automatically determine the contrast phase of a CT scan.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!