Purpose: The integration of artificial intelligence (AI) into medical education has witnessed significant progress, particularly in the domain of language models. This study focuses on assessing the performance of two notable language models, ChatGPT and BingAI Precise, in answering the National Eligibility Entrance Test for Postgraduates (NEET-PG)-style practice questions, simulating medical exam formats.

Methods: A cross-sectional study conducted in June 2023 involved assessing ChatGPT and BingAI Precise using three sets of NEET-PG practice exams, comprising 200 questions each. The questions were categorized by difficulty levels (easy, moderate, difficult), excluding those with images or tables. The AI models' responses were compared to reference answers provided by the Dr. Bhatia Medical Coaching Institute (DBMCI). Statistical analysis was employed to evaluate accuracy, coherence, and overall performance across different difficulty levels.

Results: In the analysis of 600 questions across three test sets, both ChatGPT and BingAI demonstrated competence in answering NEET-PG style questions, achieving passing scores. However, BingAI consistently outperformed ChatGPT, exhibiting higher accuracy rates across all three question banks. The statistical comparison indicated significant differences in correct answer rates between the two models.

Conclusions: The study concludes that both ChatGPT and BingAI have the potential to serve as effective study aids for medical licensing exams. While both models showed competence in answering questions, BingAI consistently outperformed ChatGPT, suggesting its higher accuracy rates. Future improvements, including enhanced image interpretation, could further establish these large language models (LLMs) as valuable tools in both educational and clinical settings.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11747852PMC
http://dx.doi.org/10.7759/cureus.76108DOI Listing

Publication Analysis

Top Keywords

chatgpt bingai
20
language models
12
answering national
8
national eligibility
8
eligibility entrance
8
entrance test
8
test postgraduates
8
postgraduates neet-pg-style
8
neet-pg-style practice
8
practice questions
8

Similar Publications

Purpose: The integration of artificial intelligence (AI) into medical education has witnessed significant progress, particularly in the domain of language models. This study focuses on assessing the performance of two notable language models, ChatGPT and BingAI Precise, in answering the National Eligibility Entrance Test for Postgraduates (NEET-PG)-style practice questions, simulating medical exam formats.

Methods: A cross-sectional study conducted in June 2023 involved assessing ChatGPT and BingAI Precise using three sets of NEET-PG practice exams, comprising 200 questions each.

View Article and Find Full Text PDF
Article Synopsis
  • There has been a growing number of cervical fusion surgeries in the U.S., but there's a lack of research on how well surgeons follow evidence-based medicine (EBM) guidelines, particularly as patients turn to large language models (LLMs) for decision-making assistance.* -
  • An observational study tested four LLMs—Bard, BingAI, ChatGPT-3.5, and ChatGPT-4—against the 2023 North American Spine Society (NASS) cervical fusion guidelines, and found that none fully adhered, with only ChatGPT-4 and Bing Chat achieving 60% compliance.* -
  • The findings suggest a need for better training of LLMs on clinical guidelines and highlight the necessity of
View Article and Find Full Text PDF

Background: The advent of Large Language Models (LLMs) like ChatGPT has introduced significant advancements in various surgical disciplines. These developments have led to an increased interest in the utilization of LLMs for Current Procedural Terminology (CPT) coding in surgery. With CPT coding being a complex and time-consuming process, often exacerbated by the scarcity of professional coders, there is a pressing need for innovative solutions to enhance coding efficiency and accuracy.

View Article and Find Full Text PDF

Background: Large language models (LLMs) are becoming increasingly important as they are being used more frequently for providing medical information. Our aim is to evaluate the effectiveness of electronic artificial intelligence (AI) large language models (LLMs), such as ChatGPT-4, BingAI, and Gemini in responding to patient inquiries about retinopathy of prematurity (ROP).

Methods: The answers of LLMs for fifty real-life patient inquiries were assessed using a 5-point Likert scale by three ophthalmologists.

View Article and Find Full Text PDF

To evaluate the accuracy of AI chatbots in staging pressure injuries according to the National Pressure Injury Advisory Panel (NPIAP) Staging through clinical image interpretation, a cross-sectional design was conducted to assess five leading publicly available AI chatbots. As a result, three chatbots were unable to interpret the clinical images, whereas GPT-4 Turbo achieved a high accuracy rate (83.0%) in staging pressure injuries, notably outperforming BingAI Creative mode (24.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!