Purpose: Compare large language models (LLMs) in analyzing and responding to a difficult series of ophthalmic cases.

Design: A comparative case series involving LLMs that met inclusion criteria tested on twenty difficult case studies posed in open-text format.

Methods: Fifteen LLMs accessible to ophthalmologists were tested against twenty case studies published in JAMA Ophthalmology. Each case was presented in identical, open-ended text fashion to each LLM and open-ended responses regarding differential diagnosis, next diagnostic tests and recommended treatments were requested. Responses were recorded and assessed for accuracy against published correct answers. The main outcome was accuracy of LLMs against the correct answers. Secondary outcomes included comparative performance on the differential diagnosis, ancillary testing, and treatment subtests; and readability of responses.

Results: Scores were normally distributed and ranged from 0-35 (with a maximum score of 60) with a mean ± standard deviation of 19 ± 9. Scores for three of the LLMs (ChatGPT 3.5, Claude Pro, and Copilot Pro) were statistically significantly higher than the mean. Two of the high-performing LLMs were paid subscription (Claude Pro and Copilot Pro) and one was free (ChatGPT 3.5). While there were no clinical or statistical differences between ChatGPT 3.5 and Claude Pro, a separation of +5 points, or 0.56 standard deviations, between Copilot Pro and the other highly ranked LLMs was present. Readability of all tested programs were above the AMA (American Medical Association) reading level recommendations to public consumers of eight grade.

Conclusion: Subscription LLMs were more prevalent among highly ranked LLMs suggesting that these perform better as ophthalmic assistants. While readability was poor for the average person, the content was understood by a board-certified ophthalmologist. The accuracy of LLMs is not high enough to recommend patient care in standalone mode, but aiding clinicians in patient care and prevent oversights is promising.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11568767PMC
http://dx.doi.org/10.2147/OPTH.S488232DOI Listing

Publication Analysis

Top Keywords

claude pro
12
copilot pro
12
llms
10
large language
8
language models
8
tested twenty
8
case studies
8
differential diagnosis
8
correct answers
8
accuracy llms
8

Similar Publications

The SARS-CoV-2 main protease (M or Nsp5) is critical for production of viral proteins during infection and, like many viral proteases, also targets host proteins to subvert their cellular functions. Here, we show that the human tRNA methyltransferase TRMT1 is recognized and cleaved by SARS-CoV-2 M. TRMT1 installs the ,-dimethylguanosine (m2,2G) modification on mammalian tRNAs, which promotes cellular protein synthesis and redox homeostasis.

View Article and Find Full Text PDF

Oxidative stress is augmented under hypoxic environments, which may be attenuated with antioxidant supplementation. We investigated the effects of dietary nitrate (NO-) supplementation combined with high-intensity training performed under hypoxic conditions on antioxidant/pro-oxidant balance. Thirty trained participants were assigned to one of three groups - HNO: hypoxia (13% FO) + NO-; HPL: hypoxia + placebo; CON: normoxia (20.

View Article and Find Full Text PDF

Purpose: Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment. This study aims to evaluate the performance of four general-purpose large language model (LLMs) in IR and RA tasks.

Materials And Methods: We conducted a study using simulated text reports from computed tomography, magnetic resonance imaging, bone scans, and biopsy pathology on stage IV PC patients.

View Article and Find Full Text PDF

Safety of baricitinib in vaccinated patients with severe and critical COVID-19 sub study of the randomised Bari-SolidAct trial.

EBioMedicine

December 2024

Research Institute of Internal Medicine, Oslo University Hospital Rikshospitalet, Oslo, Norway; Faculty of Medicine, Institute of Clinical Medicine, University of Oslo, Oslo, Norway; Section for Clinical Immunology and Infectious Diseases, Oslo University Hospital Rikshospitalet, Oslo, Norway. Electronic address:

Background: The Bari-SolidAct randomized controlled trial compared baricitinib with placebo in patients with severe COVID-19. A post hoc analysis revealed a higher incidence of serious adverse events (SAEs) among SARS-CoV-2-vaccinated participants who had received baricitinib. This sub-study aimed to investigate whether vaccination influences the safety profile of baricitinib in patients with severe COVID-19.

View Article and Find Full Text PDF
Article Synopsis
  • Spatial transcriptomics provides valuable insights into tissue cellular landscapes, especially in cancer research focusing on the tumor microenvironment, but poses challenges in data interpretation.
  • This study evaluates Large Language Models (LLMs) to improve the analysis of spatial transcriptomic data from a murine melanoma model, finding that most models struggled, with Claude 3.5 Sonnet performing best for identifying gene expression patterns.
  • The LLM's capabilities facilitated a systematic workflow to analyze the tumor immune landscape, revealing complex immunosuppressive mechanisms and enhancing understanding of tumor immunology while highlighting the need for further development to scale such approaches.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!