Background: Teaching medical students the skills required to acquire, interpret, apply, and communicate clinical information is an integral part of medical education. A crucial aspect of this process involves providing students with feedback regarding the quality of their free-text clinical notes.

Objective: The goal of this study was to assess the ability of ChatGPT 3.5, a large language model, to score medical students' free-text history and physical notes.

Methods: This is a single-institution, retrospective study. Standardized patients learned a prespecified clinical case and, acting as the patient, interacted with medical students. Each student wrote a free-text history and physical note of their interaction. The students' notes were scored independently by the standardized patients and ChatGPT using a prespecified scoring rubric that consisted of 85 case elements. The measure of accuracy was percent correct.

Results: The study population consisted of 168 first-year medical students. There was a total of 14,280 scores. The ChatGPT incorrect scoring rate was 1.0%, and the standardized patient incorrect scoring rate was 7.2%. The ChatGPT error rate was 86%, lower than the standardized patient error rate. The ChatGPT mean incorrect scoring rate of 12 (SD 11) was significantly lower than the standardized patient mean incorrect scoring rate of 85 (SD 74; P=.002).

Conclusions: ChatGPT demonstrated a significantly lower error rate compared to standardized patients. This is the first study to assess the ability of a generative pretrained transformer (GPT) program to score medical students' standardized patient-based free-text clinical notes. It is expected that, in the near future, large language models will provide real-time feedback to practicing physicians regarding their free-text notes. GPT artificial intelligence programs represent an important advance in medical education and medical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11327632PMC
http://dx.doi.org/10.2196/56342DOI Listing

Publication Analysis

Top Keywords

incorrect scoring
16
scoring rate
16
large language
12
medical students
12
standardized patients
12
standardized patient
12
error rate
12
medical
9
language model
8
model score
8

Similar Publications

The increasing prevalence of diabetes mellitus worldwide necessitates that medical undergraduates acquire a deep understanding of the disease to ensure accurate diagnosis and effective management. Traditional teaching methods, while foundational, often lack the interactive elements that enhance student engagement and knowledge retention. This study aimed to evaluate the effectiveness of a novel educational board game, "Diabe-teach," in enhancing knowledge retention among medical students compared with conventional self-study methods.

View Article and Find Full Text PDF

Sources of HIV information and women's HIV knowledge in Southwest Sumba Indonesia: a cross-sectional study with mediation analysis.

BMC Public Health

January 2025

Public Policy, Management, and Analytics, College of Urban Planning and Public Affairs, University of Illinois at Chicago, Chicago, IL, 60607, USA.

Background: Despite multiple years of government HIV educational efforts, the growing trend of new cases among women in Indonesia runs parallel with their seemingly overall lack of comprehensive knowledge about HIV. A major prevention challenge for the Indonesian government lies in delivering HIV prevention education across the world's largest archipelago. This study investigates comprehensive HIV knowledge among reproductive-age women in Southwest Sumba, Indonesia, and the sources through which they report having learned about HIV along with potential mediators of the relationship between socioeconomic status (SES) and HIV knowledge.

View Article and Find Full Text PDF

Purpose: Bone cement-reinforced fenestrated pedicle screws (FPSs) have been widely used in the internal fixation and repair of the spine with osteoporosis in recent years and show significant improvement in fixation strength and stability. However, compared with conventional reinforcement methods, the advantages of bone cement-reinforced FPSs remain undetermined. This article compares the effects of fenestrated and conventional pedicle screws (CPSs) combined with bone cement in the treatment of osteoporosis.

View Article and Find Full Text PDF

People living with HIV (PLWH) are disproportionately affected by depression, which often remains underdiagnosed and untreated, negatively impacting quality of life and treatment outcomes. Low resource settings often lack clinical professionals to identify depression, therefore screening tools such as the PHQ-9 allow for broader depression screening. This qualitative study among PLWH in Yaoundé Cameroon aimed to a) explore local understandings of depression and mental distress and b) assess comprehension and interpretation of the PHQ-9 items and response categories.

View Article and Find Full Text PDF

Utilizing domain knowledge to improve the classification of intravenous contrast phase of CT scans.

Comput Med Imaging Graph

November 2024

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Clinical Center, National Institutes of Health, United States of America. Electronic address:

Multiple intravenous contrast phases of CT scans are commonly used in clinical practice to facilitate disease diagnosis. However, contrast phase information is commonly missing or incorrect due to discrepancies in CT series descriptions and imaging practices. This work aims to develop a classification algorithm to automatically determine the contrast phase of a CT scan.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!