In-depth analysis of ChatGPT's performance based on specific signaling words and phrases in the question stem of 2377 USMLE step 1 style questions.

Leonard Knoedler Samuel Knoedler Cosima C Hoch Lukas Prantl Konstantin Frank Laura Soiderer Sebastian Cotofana Amir H Dorafshar Thilo Schenck Felix Vollbach Giuseppe Sofo Michael Alfertshofer

Sci Rep

Department of Plastic Surgery and Hand Surgery, Klinikum Rechts Der Isar, Technical University of Munich, Munich, Germany.

Published: June 2024

ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169536	PMC
http://dx.doi.org/10.1038/s41598-024-63997-7	DOI Listing

Publication Analysis

Top Keywords

usmle step

question bank

question difficulty

question

questions

in-depth analysis

analysis chatgpt's

chatgpt's performance

performance based

based specific

Similar Publications

A Literature Review on Optimizing Study Strategies in Medical Education: Insights From Exam Scores and Study Resources.

Cureus

November 2024

Department of Medical Education, Nova Southeastern University Dr. Kiran C. Patel College of Allopathic Medicine, Fort Lauderdale, USA.

Gabriella Morey Valeria C Morey Taylor Gruman Thura Al-Khayat

Medical school exams, like those by the National Board of Medical Examiners (NBME) and the United States Medical Licensing Examination (USMLE), assess essential knowledge and skills for safe patient care, essential for student advancement and securing competitive residencies. Understanding the correlation between exam scores and medical school performance, as well as identifying trends among high scorers, provides valuable insights for both medical students and educators. This review examines the link between study resources and NBME exam scores, as well as psychological factors influencing these outcomes.

View Article and Find Full Text PDF

Similar Publications

Trends in Urology Residency Applications: Results from The Society of Academic Urologists Program Director Survey from 2022 to 2024.

Urology

December 2024

University of Kansas, Department of Urology, Kansas City, KS. Electronic address:

Michael Creswell Kirsten Greene Lee Richstone Simone Thavaseelan Erica Traxel

Objectives: To provide a cross-sectional view of the current opinions surrounding the urology match by analyzing data from the annual Society of Academic Urologists Program Director Surveys conducted between 2022 and 2024.

Methods: Data collected through surveys distributed to all urology program directors consisting of questions covering program demographics, applicant selection criteria, preference signals, virtual interviews, and other relevant topics.

Results: 89, 90, and 89 program directors participated in the surveys for the years 2022, 2023, and 2024, respectively.

View Article and Find Full Text PDF

Similar Publications

To the Editor: Electronically Implementing COMLEX-USA Level 2 to USMLE Step 2 Conversion Tools.

J Grad Med Educ

December 2024

is Core Faculty, Department of Emergency Medicine, Allegheny Health Network, Erie, Pennsylvania, USA.

Dhimitri A Nikolla Brandon M Dedrick Emily Frack Aman Ahuja Richard Rowland

View Article and Find Full Text PDF

Similar Publications

QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams.

Pac Symp Biocomput

December 2024

Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.

Suhana Bedi Scott L Fleming Chia-Chun Chiang Keith Morse Aswathi Kumar

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions.

View Article and Find Full Text PDF

Similar Publications

USMLE Performance, Subsequent Standardized Testing, and ABIM Certification Exam Preparation for Internal Medicine Residency Programs: A Narrative Review.

J Gen Intern Med

December 2024

Division of Hospital Medicine, Department of Medicine, Emory University School of Medicine, Atlanta, GA, USA.

Dustin T Smith Alexander T Matelski Mary Ann Kirkconnell Hall Varun K Phadke Theresa Vettese

Standardized examinations measure progress throughout medical education. Successful completion of the American Board of Internal Medicine Certification Examination (ABIM-CE) benchmarks completion of internal medicine (IM) residency training. Recent declines in initial ABIM-CE pass rates may prompt residency programs to examine strategies to improve learner performance.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!