Comparison of Large Language Models in Diagnosis and Management of Challenging Clinical Cases.

Sujeeth Krishna Shanmugam David J Browning

Clin Ophthalmol

Department of Ophthalmology, Wake Forest University School of Medicine, Winston-Salem, NC, USA.

Published: November 2024

Purpose: Compare large language models (LLMs) in analyzing and responding to a difficult series of ophthalmic cases.

Design: A comparative case series involving LLMs that met inclusion criteria tested on twenty difficult case studies posed in open-text format.

Methods: Fifteen LLMs accessible to ophthalmologists were tested against twenty case studies published in JAMA Ophthalmology. Each case was presented in identical, open-ended text fashion to each LLM and open-ended responses regarding differential diagnosis, next diagnostic tests and recommended treatments were requested. Responses were recorded and assessed for accuracy against published correct answers. The main outcome was accuracy of LLMs against the correct answers. Secondary outcomes included comparative performance on the differential diagnosis, ancillary testing, and treatment subtests; and readability of responses.

Results: Scores were normally distributed and ranged from 0-35 (with a maximum score of 60) with a mean ± standard deviation of 19 ± 9. Scores for three of the LLMs (ChatGPT 3.5, Claude Pro, and Copilot Pro) were statistically significantly higher than the mean. Two of the high-performing LLMs were paid subscription (Claude Pro and Copilot Pro) and one was free (ChatGPT 3.5). While there were no clinical or statistical differences between ChatGPT 3.5 and Claude Pro, a separation of +5 points, or 0.56 standard deviations, between Copilot Pro and the other highly ranked LLMs was present. Readability of all tested programs were above the AMA (American Medical Association) reading level recommendations to public consumers of eight grade.

Conclusion: Subscription LLMs were more prevalent among highly ranked LLMs suggesting that these perform better as ophthalmic assistants. While readability was poor for the average person, the content was understood by a board-certified ophthalmologist. The accuracy of LLMs is not high enough to recommend patient care in standalone mode, but aiding clinicians in patient care and prevent oversights is promising.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568767	PMC
http://dx.doi.org/10.2147/OPTH.S488232	DOI Listing

Publication Analysis

Top Keywords

claude pro

copilot pro

llms

large language

language models

tested twenty

case studies

differential diagnosis

correct answers

accuracy llms

Similar Publications

Recognition and cleavage of human tRNA methyltransferase TRMT1 by the SARS-CoV-2 main protease.

Elife

January 2025

Department of Chemistry & Biochemistry, University of Delaware, Newark, United States.

Angel D'Oliviera Xuhang Dai Saba Mottaghinia Sophie Olson Evan P Geissler

The SARS-CoV-2 main protease (M or Nsp5) is critical for production of viral proteins during infection and, like many viral proteases, also targets host proteins to subvert their cellular functions. Here, we show that the human tRNA methyltransferase TRMT1 is recognized and cleaved by SARS-CoV-2 M. TRMT1 installs the ,-dimethylguanosine (m2,2G) modification on mammalian tRNAs, which promotes cellular protein synthesis and redox homeostasis.

View Article and Find Full Text PDF

Similar Publications

Dietary nitrate supplementation very slightly mitigates the oxidative stress induced by high-intensity training performed in normobaric hypoxia.

Biol Sport

January 2025

Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland.

Ana Sousa Marie Chambion-Diaz Vincent Pialoux Romain Carin João Luís Viana

Oxidative stress is augmented under hypoxic environments, which may be attenuated with antioxidant supplementation. We investigated the effects of dietary nitrate (NO-) supplementation combined with high-intensity training performed under hypoxic conditions on antioxidant/pro-oxidant balance. Thirty trained participants were assigned to one of three groups - HNO: hypoxia (13% FO) + NO-; HPL: hypoxia + placebo; CON: normoxia (20.

View Article and Find Full Text PDF

Similar Publications

The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.

World J Mens Health

December 2024

Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei, Taiwan.

Lun-Hsiang Yuan Shi-Wei Huang Dean Chou Chung-You Tsai

Purpose: Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment. This study aims to evaluate the performance of four general-purpose large language model (LLMs) in IR and RA tasks.

Materials And Methods: We conducted a study using simulated text reports from computed tomography, magnetic resonance imaging, bone scans, and biopsy pathology on stage IV PC patients.

View Article and Find Full Text PDF

Similar Publications

Safety of baricitinib in vaccinated patients with severe and critical COVID-19 sub study of the randomised Bari-SolidAct trial.

EBioMedicine

December 2024

Research Institute of Internal Medicine, Oslo University Hospital Rikshospitalet, Oslo, Norway; Faculty of Medicine, Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Section for Clinical Immunology and Infectious Diseases, Oslo University Hospital Rikshospitalet, Oslo, Norway. Electronic address:

Hans-Kittil Viermyr Kristian Tonby Erica Ponzi Sophie Trouillet-Assant Julien Poissy

Background: The Bari-SolidAct randomized controlled trial compared baricitinib with placebo in patients with severe COVID-19. A post hoc analysis revealed a higher incidence of serious adverse events (SAEs) among SARS-CoV-2-vaccinated participants who had received baricitinib. This sub-study aimed to investigate whether vaccination influences the safety profile of baricitinib in patients with severe COVID-19.

View Article and Find Full Text PDF

Similar Publications

Tackling the Complexity of Spatial Transcriptomics Data Interpretation with Large Language Models.

bioRxiv

December 2024

Taushif Khan Colleen M Farley John J Wilson Chih-Hao Chang Damien Chaussabel

Article Synopsis

Spatial transcriptomics provides valuable insights into tissue cellular landscapes, especially in cancer research focusing on the tumor microenvironment, but poses challenges in data interpretation.
This study evaluates Large Language Models (LLMs) to improve the analysis of spatial transcriptomic data from a murine melanoma model, finding that most models struggled, with Claude 3.5 Sonnet performing best for identifying gene expression patterns.
The LLM's capabilities facilitated a systematic workflow to analyze the tumor immune landscape, revealing complex immunosuppressive mechanisms and enhancing understanding of tumor immunology while highlighting the need for further development to scale such approaches.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!