Use of Online Large Language Model Chatbots in Cornea Clinics.

Cornea

Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.

Published: December 2024

AI Article Synopsis

  • This study evaluated the performance of various large language model (LLM) chatbots, including ChatGPT and Google Bard, in responding to cornea-related medical scenarios to assess their effectiveness in clinical settings.
  • ChatGPT performed the best overall, scoring an average of 83.8% across various evaluation criteria, while Google Bard excelled in readability metrics.
  • The research underscores the potential benefits of LLMs in ophthalmology but emphasizes the need for careful monitoring and professional oversight to ensure safety and effectiveness in patient care.

Article Abstract

Purpose: Online large language model (LLM) chatbots have garnered attention for their potential in enhancing efficiency, providing education, and advancing research. This study evaluated the performance of LLM chatbots-Chat Generative Pre-Trained Transformer (ChatGPT), Writesonic, Google Bard, and Bing Chat-in responding to cornea-related scenarios.

Methods: Prompts covering clinic administration, patient counselling, treatment algorithms, surgical management, and research were devised. Responses from LLMs were assessed by 3 fellowship-trained cornea specialists, blinded to the LLM used, using a standardized rubric evaluating accuracy, comprehension, compassion, professionalism, humanness, comprehensiveness, and overall quality. In addition, 12 readability metrics were used to further evaluate responses. Scores were averaged and ranked; subgroup analyses were performed to identify the best-performing LLM for each rubric criterion.

Results: Sixty-six responses were generated from 11 prompts. ChatGPT outperformed the other LLMs across all rubric criteria, scoring an overall response score of 3.35 ± 0.42 (83.8%). However, Google Bard excelled in readability, leading in 75% of the metrics assessed. Importantly, no responses were found to pose risks to patients, ensuring the safety and reliability of the information provided.

Conclusions: ChatGPT demonstrated superior accuracy and comprehensiveness in responding to cornea-related prompts, whereas Google Bard stood out for its readability. The study highlights the potential of LLMs in streamlining various clinical, administrative, and research tasks in ophthalmology. Future research should incorporate patient feedback and ongoing data collection to monitor LLM performance over time. Despite their promise, LLMs should be used with caution, necessitating continuous oversight by medical professionals and standardized evaluations to ensure patient safety and maximize benefits.

Download full-text PDF

Source
http://dx.doi.org/10.1097/ICO.0000000000003747DOI Listing

Publication Analysis

Top Keywords

google bard
12
online large
8
large language
8
language model
8
responding cornea-related
8
llm
5
model chatbots
4
chatbots cornea
4
cornea clinics
4
clinics purpose
4

Similar Publications

Background It is recognised that large language models (LLMs) may aid medical education by supporting the understanding of explanations behind answers to multiple choice questions. This study aimed to evaluate the efficacy of LLM chatbots ChatGPT and Bard in answering an Intermediate Life Support pre-course multiple choice question (MCQs) test developed by the Resuscitation Council UK focused on managing deteriorating patients and identifying causes and treating cardiac arrest. We assessed the accuracy of responses and quality of explanations to evaluate the utility of the chatbots.

View Article and Find Full Text PDF

Can Artificial Intelligence Deceive Residency Committees? A Randomized Multicenter Analysis of Letters of Recommendation.

J Am Acad Orthop Surg

December 2024

From the University of California, Davis, Sacramento, CA (Simister, Le, Meehan, Leshikar, Saiz, and Lum), the San Joaquin General Hospital, French Camp, CA (Huish), the Cedars Sinai, Los Angeles, CA (Tsai), and the Yale University, New Haven, CT (Halim and Tuason).

Introduction: The introduction of generative artificial intelligence (AI) may have a profound effect on residency applications. In this study, we explore the abilities of AI-generated letters of recommendation (LORs) by evaluating the accuracy of orthopaedic surgery residency selection committee members to identify LORs written by human or AI authors.

Methods: In a multicenter, single-blind trial, a total of 45 LORs (15 human, 15 ChatGPT, and 15 Google BARD) were curated.

View Article and Find Full Text PDF

Background Large language models (LLMs) are increasingly explored in healthcare and education. In medical education, they hold the potential to enhance learning by supporting personalized teaching, resource development, and student engagement. However, LLM use also raises concerns about ethics, accuracy, and reliance.

View Article and Find Full Text PDF

Background: Artificial intelligence-based language model chatbots are being increasingly used as a quick reference for healthcare related information. In pediatric orthopaedics, studies have shown that a significant percentage of parents use online search engines to find out more about the health condition of their children. Several studies have investigated the accuracy of the responses generated from these chatbots.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!