Objectives: To evaluate the accuracy of ChatGPT and Bard in answering pathology examination questions requiring image interpretation.
Methods: The study evaluated ChatGPT-4 and Bard's performance using 86 multiple-choice questions, with 17 (19.8%) focusing on general pathology and 69 (80.2%) on systemic pathology. Of these, 62 (72.1%) included microscopic images, and 57 (66.3%) were first-order questions focusing on diagnosing the disease. The authors presented these artificial intelligence (AI) tools with questions, both with and without clinical contexts, and assessed their answers against a reference standard set by pathologists.
Results: ChatGPT-4 achieved a 100% (n = 86) accuracy rate in questions with clinical context, surpassing Bard's 87.2% (n = 75). Without context, the accuracy of both AI tools declined significantly, with ChatGPT-4 at 52.3% (n = 45) and Bard at 38.4% (n = 33). ChatGPT-4 consistently outperformed Bard across various categories, particularly in systemic pathology and first-order questions. A notable issue identified was Bard's tendency to "hallucinate" or provide plausible but incorrect answers, especially without clinical context.
Conclusions: This study demonstrated the potential of ChatGPT and Bard in pathology education, stressing the importance of clinical context for accurate AI interpretations of pathology images. It underlined the need for careful AI integration in medical education.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1093/ajcp/aqae036 | DOI Listing |
Heliyon
December 2024
Department of Emergency Medicine, Arrowhead Regional Medical Center, 400 N. Pepper Ave, Colton, CA, 92324, USA.
Background: Large language models (LLMs) such as ChatGPT-4 (CG4) are proving to be valuable tools in the medical field, not only in facilitating administrative tasks, but in augmenting medical decision-making. LLMs have previously been tested for diagnostic accuracy with expert-generated questions and standardized test data. Among those studies, CG4 consistently outperformed alternative LLMs, including ChatGPT-3.
View Article and Find Full Text PDFInt J Obstet Anesth
December 2024
Department of Anesthesiology, 8700 Beverly Blvd #4209, Cedars-Sinai Medical Center, Los Angeles, CA 90064, United States. Electronic address:
Introduction: Over 90% of pregnant women and 76% expectant fathers search for pregnancy health information. We examined readability, accuracy and quality of answers to common obstetric anesthesia questions from the popular generative artificial intelligence (AI) chatbots ChatGPT and Bard.
Methods: Twenty questions for generative AI chatbots were derived from frequently asked questions based on professional society, hospital and consumer websites.
J Pers Med
December 2024
Department of Clinical Research, University of Southern Denmark, 5230 Odense, Denmark.
Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). These chatbots have facilitated new research avenues and have gained traction in both clinical and surgical applications in ophthalmology.
View Article and Find Full Text PDFJMIR Form Res
December 2024
Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany.
Background: The rapid development of large language models (LLMs) such as OpenAI's ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities.
View Article and Find Full Text PDFWorld J Methodol
December 2024
Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India.
Background: Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.
Aim: To evaluate the accuracy of AI systems (ChatGPT 3.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!