Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Clin Rheumatol

Department of Physical Medicine and Rehabilitation, Kanuni Sultan Süleyman Training and Research Hospital, University of Health Sciences, Istanbul, Turkey.

Published: November 2024

Objectives: This study evaluates the performance of AI models, ChatGPT-4o and Google Gemini, in answering rheumatology board-level questions, comparing their effectiveness, reliability, and applicability in clinical practice.

Method: A cross-sectional study was conducted using 420 rheumatology questions from the BoardVitals question bank, excluding 27 visual data questions. Both artificial intelligence models categorized the questions according to difficulty (easy, medium, hard) and answered them. In addition, the reliability of the answers was assessed by asking the questions a second time. The accuracy, reliability, and difficulty categorization of the AI models' response to the questions were analyzed.

Results: ChatGPT-4o answered 86.9% of the questions correctly, significantly outperforming Google Gemini's 60.2% accuracy (p < 0.001). When the questions were asked a second time, the success rate was 86.7% for ChatGPT-4o and 60.5% for Google Gemini. Both models mainly categorized questions as medium difficulty. ChatGPT-4o showed higher accuracy in various rheumatology subfields, notably in Basic and Clinical Science (p = 0.028), Osteoarthritis (p = 0.023), and Rheumatoid Arthritis (p < 0.001).

Conclusions: ChatGPT-4o significantly outperformed Google Gemini in rheumatology board-level questions. This demonstrates the success of ChatGPT-4o in situations requiring complex and specialized knowledge related to rheumatological diseases. The performance of both AI models decreased as the question difficulty increased. This study demonstrates the potential of AI in clinical applications and suggests that its use as a tool to assist clinicians may improve healthcare efficiency in the future. Future studies using real clinical scenarios and real board questions are recommended. Key Points •ChatGPT-4o significantly outperformed Google Gemini in answering rheumatology board-level questions, achieving 86.9% accuracy compared to Google Gemini's 60.2%. •For both AI models, the correct answer rate decreased as the question difficulty increased. •The study demonstrates the potential for AI models to be used in clinical practice as a tool to assist clinicians and improve healthcare efficiency.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s10067-024-07154-5DOI Listing

Publication Analysis

Top Keywords

artificial intelligence
8
intelligence models
8
rheumatology board-level
8
questions
8
board-level questions
8
google gemini
8
comparative performance
4
performance artificial
4
models rheumatology
4
questions evaluating
4

Similar Publications

Deep learning-based design and experimental validation of a medicine-like human antibody library.

Brief Bioinform

November 2024

Biotherapeutics Molecule Discovery, Boehringer Ingelheim Pharmaceutical Inc., 900 Ridgebury Road, Ridgefield, CT 06877, United States.

Antibody generation requires the use of one or more time-consuming methods, namely animal immunization, and in vitro display technologies. However, the recent availability of large amounts of antibody sequence and structural data in the public domain along with the advent of generative deep learning algorithms raises the possibility of computationally generating novel antibody sequences with desirable developability attributes. Here, we describe a deep learning model for computationally generating libraries of highly human antibody variable regions whose intrinsic physicochemical properties resemble those of the variable regions of the marketed antibody-based biotherapeutics (medicine-likeness).

View Article and Find Full Text PDF

Accurate survival prediction of patients with long-bone metastases is challenging, but important for optimizing treatment. The Skeletal Oncology Research Group (SORG) machine learning algorithm (MLA) has been previously developed and internally validated to predict 90-day and 1-year survival. External validation showed promise in the United States and Taiwan.

View Article and Find Full Text PDF

AI comes to the Nobel Prize and drug discovery.

J Pharm Anal

November 2024

College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, State Key Laboratory of Advanced Drug Delivery and Release Systems, Zhejiang University, Hangzhou 310058, China.

View Article and Find Full Text PDF

The association between total social exposure and incident multimorbidity: A population-based cohort study.

SSM Popul Health

March 2025

Dalla Lana School of Public Health, University of Toronto, Health Sciences Building, 155 College Street, 6th Floor, Toronto, Ontario, M5T 3M7, Canada.

Background: Multimorbidity, the co-occurrence of two or more chronic conditions, is associated with the social determinants of health. Using comprehensive linked population-representative data, we sought to understand the combined effect of multiple social determinants on multimorbidity incidence in Ontario, Canada.

Methods: Ontario respondents aged 20-55 in 2001-2011 cycles of the Canadian Community Health Survey were linked to administrative health data ascertain multimorbidity status until 2022.

View Article and Find Full Text PDF

Multimodal artificial intelligence system for detecting a small esophageal high-grade squamous intraepithelial neoplasia: A case report.

World J Gastrointest Endosc

January 2025

Department of Gastroenterology and Hepatology, West China Hospital, Sichuan University, Chengdu 610041, Sichuan Province, China.

Background: Recent advancements in artificial intelligence (AI) have significantly enhanced the capabilities of endoscopic-assisted diagnosis for gastrointestinal diseases. AI has shown great promise in clinical practice, particularly for diagnostic support, offering real-time insights into complex conditions such as esophageal squamous cell carcinoma.

Case Summary: In this study, we introduce a multimodal AI system that successfully identified and delineated a small and flat carcinoma during esophagogastroduodenoscopy, highlighting its potential for early detection of malignancies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!