ChatGPT has garnered attention as a multifaceted AI chatbot with potential applications in medicine. Despite intriguing preliminary findings in areas such as clinical management and patient education, there remains a substantial knowledge gap in comprehensively understanding the chances and limitations of ChatGPT's capabilities, especially in medical test-taking and education. A total of n = 2,729 USMLE Step 1 practice questions were extracted from the Amboss question bank. After excluding 352 image-based questions, a total of 2,377 text-based questions were further categorized and entered manually into ChatGPT, and its responses were recorded. ChatGPT's overall performance was analyzed based on question difficulty, category, and content with regards to specific signal words and phrases. ChatGPT achieved an overall accuracy rate of 55.8% in a total number of n = 2,377 USMLE Step 1 preparation questions obtained from the Amboss online question bank. It demonstrated a significant inverse correlation between question difficulty and performance with r = -0.306; p < 0.001, maintaining comparable accuracy to the human user peer group across different levels of question difficulty. Notably, ChatGPT outperformed in serology-related questions (61.1% vs. 53.8%; p = 0.005) but struggled with ECG-related content (42.9% vs. 55.6%; p = 0.021). ChatGPT achieved statistically significant worse performances in pathophysiology-related question stems. (Signal phrase = "what is the most likely/probable cause"). ChatGPT performed consistent across various question categories and difficulty levels. These findings emphasize the need for further investigations to explore the potential and limitations of ChatGPT in medical examination and education.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169536PMC
http://dx.doi.org/10.1038/s41598-024-63997-7DOI Listing

Publication Analysis

Top Keywords

usmle step
12
question bank
8
question difficulty
8
question
5
questions
5
in-depth analysis
4
analysis chatgpt's
4
chatgpt's performance
4
performance based
4
based specific
4

Similar Publications

Medical school exams, like those by the National Board of Medical Examiners (NBME) and the United States Medical Licensing Examination (USMLE), assess essential knowledge and skills for safe patient care, essential for student advancement and securing competitive residencies. Understanding the correlation between exam scores and medical school performance, as well as identifying trends among high scorers, provides valuable insights for both medical students and educators. This review examines the link between study resources and NBME exam scores, as well as psychological factors influencing these outcomes.

View Article and Find Full Text PDF

Objectives: To provide a cross-sectional view of the current opinions surrounding the urology match by analyzing data from the annual Society of Academic Urologists Program Director Surveys conducted between 2022 and 2024.

Methods: Data collected through surveys distributed to all urology program directors consisting of questions covering program demographics, applicant selection criteria, preference signals, virtual interviews, and other relevant topics.

Results: 89, 90, and 89 program directors participated in the surveys for the years 2022, 2023, and 2024, respectively.

View Article and Find Full Text PDF

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions.

View Article and Find Full Text PDF

Standardized examinations measure progress throughout medical education. Successful completion of the American Board of Internal Medicine Certification Examination (ABIM-CE) benchmarks completion of internal medicine (IM) residency training. Recent declines in initial ABIM-CE pass rates may prompt residency programs to examine strategies to improve learner performance.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!