Purpose: The integration of artificial intelligence (AI) into medical education has witnessed significant progress, particularly in the domain of language models. This study focuses on assessing the performance of two notable language models, ChatGPT and BingAI Precise, in answering the National Eligibility Entrance Test for Postgraduates (NEET-PG)-style practice questions, simulating medical exam formats.
Methods: A cross-sectional study conducted in June 2023 involved assessing ChatGPT and BingAI Precise using three sets of NEET-PG practice exams, comprising 200 questions each. The questions were categorized by difficulty levels (easy, moderate, difficult), excluding those with images or tables. The AI models' responses were compared to reference answers provided by the Dr. Bhatia Medical Coaching Institute (DBMCI). Statistical analysis was employed to evaluate accuracy, coherence, and overall performance across different difficulty levels.
Results: In the analysis of 600 questions across three test sets, both ChatGPT and BingAI demonstrated competence in answering NEET-PG style questions, achieving passing scores. However, BingAI consistently outperformed ChatGPT, exhibiting higher accuracy rates across all three question banks. The statistical comparison indicated significant differences in correct answer rates between the two models.
Conclusions: The study concludes that both ChatGPT and BingAI have the potential to serve as effective study aids for medical licensing exams. While both models showed competence in answering questions, BingAI consistently outperformed ChatGPT, suggesting its higher accuracy rates. Future improvements, including enhanced image interpretation, could further establish these large language models (LLMs) as valuable tools in both educational and clinical settings.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747852 | PMC |
http://dx.doi.org/10.7759/cureus.76108 | DOI Listing |
Cureus
December 2024
Internal Medicine, Ross University School of Medicine, Saint Michael, BRB.
Purpose: The integration of artificial intelligence (AI) into medical education has witnessed significant progress, particularly in the domain of language models. This study focuses on assessing the performance of two notable language models, ChatGPT and BingAI Precise, in answering the National Eligibility Entrance Test for Postgraduates (NEET-PG)-style practice questions, simulating medical exam formats.
Methods: A cross-sectional study conducted in June 2023 involved assessing ChatGPT and BingAI Precise using three sets of NEET-PG practice exams, comprising 200 questions each.
Cureus
September 2024
Department of Neurosurgery, Thomas Jefferson Medical College, Philadelphia, USA.
J Craniofac Surg
September 2024
Department of Surgery, Division of Plastic Surgery, Nemours Children's Hospital Wilmington, DE.
Background: The advent of Large Language Models (LLMs) like ChatGPT has introduced significant advancements in various surgical disciplines. These developments have led to an increased interest in the utilization of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.
View Article and Find Full Text PDFChildren (Basel)
June 2024
Department of Ophthalmology, Izmir Tinaztepe University, Izmir 35400, Turkey.
Background: Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).
Methods: The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists.
Wound Repair Regen
November 2024
Department of Plastic and Reconstructive Surgery, The University of Tokyo Hospital, Tokyo, Japan.
To evaluate the accuracy of AI chatbots in staging pressure injuries according to the National Pressure Injury Advisory Panel (NPIAP) Staging through clinical image interpretation, a cross-sectional design was conducted to assess five leading publicly available AI chatbots. As a result, three chatbots were unable to interpret the clinical images, whereas GPT-4 Turbo achieved a high accuracy rate (83.0%) in staging pressure injuries, notably outperforming BingAI Creative mode (24.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!