Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation.

Am J Clin Pathol

Division of Pathology, Chulabhorn International College of Medicine, Thammasat University, Pathum Thani, Thailand.

Published: September 2024

Objectives: To evaluate the accuracy of ChatGPT and Bard in answering pathology examination questions requiring image interpretation.

Methods: The study evaluated ChatGPT-4 and Bard's performance using 86 multiple-choice questions, with 17 (19.8%) focusing on general pathology and 69 (80.2%) on systemic pathology. Of these, 62 (72.1%) included microscopic images, and 57 (66.3%) were first-order questions focusing on diagnosing the disease. The authors presented these artificial intelligence (AI) tools with questions, both with and without clinical contexts, and assessed their answers against a reference standard set by pathologists.

Results: ChatGPT-4 achieved a 100% (n = 86) accuracy rate in questions with clinical context, surpassing Bard's 87.2% (n = 75). Without context, the accuracy of both AI tools declined significantly, with ChatGPT-4 at 52.3% (n = 45) and Bard at 38.4% (n = 33). ChatGPT-4 consistently outperformed Bard across various categories, particularly in systemic pathology and first-order questions. A notable issue identified was Bard's tendency to "hallucinate" or provide plausible but incorrect answers, especially without clinical context.

Conclusions: This study demonstrated the potential of ChatGPT and Bard in pathology education, stressing the importance of clinical context for accurate AI interpretations of pathology images. It underlined the need for careful AI integration in medical education.

Download full-text PDF

Source
http://dx.doi.org/10.1093/ajcp/aqae036DOI Listing

Publication Analysis

Top Keywords

chatgpt bard
12
bard answering
8
answering pathology
8
pathology examination
8
examination questions
8
questions requiring
8
requiring image
8
systemic pathology
8
first-order questions
8
questions clinical
8

Similar Publications

Background: Large language models (LLMs) such as ChatGPT-4 (CG4) are proving to be valuable tools in the medical field, not only in facilitating administrative tasks, but in augmenting medical decision-making. LLMs have previously been tested for diagnostic accuracy with expert-generated questions and standardized test data. Among those studies, CG4 consistently outperformed alternative LLMs, including ChatGPT-3.

View Article and Find Full Text PDF

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

Int J Obstet Anesth

December 2024

Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States. Electronic address:

Introduction: Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.

Methods: Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites.

View Article and Find Full Text PDF

Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). These chatbots have facilitated new research avenues and have gained traction in both clinical and surgical applications in ophthalmology.

View Article and Find Full Text PDF

Background: The rapid development of large language models (LLMs) such as OpenAI's ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities.

View Article and Find Full Text PDF

Background: Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.

Aim: To evaluate the accuracy of AI systems (ChatGPT 3.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!