Background Large language models (LLMs) have emerged as powerful tools capable of processing and generating human-like text. These LLMs, such as ChatGPT (OpenAI Incorporated, Mission District, San Francisco, United States), Google Bard (Alphabet Inc., CA, US), and Microsoft Bing (Microsoft Corporation, WA, US), have been applied across various domains, demonstrating their potential to assist in solving complex tasks and improving information accessibility. However, their application in solving case vignettes in physiology has not been explored. This study aimed to assess the performance of three LLMs, namely, ChatGPT (3.5; free research version), Google Bard (Experiment), and Microsoft Bing (precise), in answering cases vignettes in Physiology. Methods This cross-sectional study was conducted in July 2023. A total of 77 case vignettes in physiology were prepared by two physiologists and were validated by two other content experts. These cases were presented to each LLM, and their responses were collected. Two physiologists independently rated the answers provided by the LLMs based on their accuracy. The ratings were measured on a scale from 0 to 4 according to the structure of the observed learning outcome (pre-structural = 0, uni-structural = 1, multi-structural = 2, relational = 3, extended-abstract). The scores among the LLMs were compared by Friedman's test and inter-observer agreement was checked by the intraclass correlation coefficient (ICC). Results The overall scores for ChatGPT, Bing, and Bard in the study, with a total of 77 cases, were found to be 3.19±0.3, 2.15±0.6, and 2.91±0.5, respectively, p<0.0001. Hence, ChatGPT 3.5 (free version) obtained the highest score, Bing (Precise) had the lowest score, and Bard (Experiment) fell in between the two in terms of performance. The average ICC values for ChatGPT, Bing, and Bard were 0.858 (95% CI: 0.777 to 0.91, p<0.0001), 0.975 (95% CI: 0.961 to 0.984, p<0.0001), and 0.964 (95% CI: 0.944 to 0.977, p<0.0001), respectively. Conclusion ChatGPT outperformed Bard and Bing in answering case vignettes in physiology. Hence, students and teachers may think about choosing LLMs for their educational purposes accordingly for case-based learning in physiology. Further exploration of their capabilities is needed for adopting those in medical education and support for clinical decision-making.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10475852 | PMC |
http://dx.doi.org/10.7759/cureus.42972 | DOI Listing |
Heliyon
December 2024
Department of Emergency Medicine, Arrowhead Regional Medical Center, 400 N. Pepper Ave, Colton, CA, 92324, USA.
Background: Large language models (LLMs) such as ChatGPT-4 (CG4) are proving to be valuable tools in the medical field, not only in facilitating administrative tasks, but in augmenting medical decision-making. LLMs have previously been tested for diagnostic accuracy with expert-generated questions and standardized test data. Among those studies, CG4 consistently outperformed alternative LLMs, including ChatGPT-3.
View Article and Find Full Text PDFJMIR Form Res
December 2024
Department of Dermatology and Allergy, Technical University of Munich, Munich, Germany.
Background: The rapid development of large language models (LLMs) such as OpenAI's ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities.
View Article and Find Full Text PDFWorld J Methodol
December 2024
Department of Orthopaedics, ACS Medical College and Hospital, Dr MGR Educational and Research Institute, Chennai 600077, Tamil Nadu, India.
Background: Medication errors, especially in dosage calculation, pose risks in healthcare. Artificial intelligence (AI) systems like ChatGPT and Google Bard may help reduce errors, but their accuracy in providing medication information remains to be evaluated.
Aim: To evaluate the accuracy of AI systems (ChatGPT 3.
Cureus
December 2024
Oral and Maxillofacial Surgery, Kings College Hospital, London, GBR.
J Am Acad Orthop Surg
December 2024
From the University of California, Davis, Sacramento, CA (Simister, Le, Meehan, Leshikar, Saiz, and Lum), the San Joaquin General Hospital, French Camp, CA (Huish), the Cedars Sinai, Los Angeles, CA (Tsai), and the Yale University, New Haven, CT (Halim and Tuason).
Introduction: The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.
Methods: In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!