Background: This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.

Methods: We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. Responses generated from NHS 111 online, were compared to different LLM-chatbots responses to determine the accuracy of LLM responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two-prompt strategy. Answers were graded as following: -2 graded as "Very poor", -1 as "Poor", O as "No response", 1 as "Good", 2 as "Very good" and 3 graded as "Excellent".

Results: Overall LLMs' attained a good accuracy in this study compared against the NHS 111 responses. The score of ≥1 graded as "Good" was achieved by 93% responses of all LLMs. This refers to at least part of this answer having correct information as well as absence of any wrong information. There was no marked difference and very similar results seen overall on both prompts.

Conclusions: The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. LLMs hold promise in enhancing patient care and healthcare accessibility in digital age.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s41433-025-03605-8DOI Listing

Publication Analysis

Top Keywords

nhs 111
24
111 online
12
large language
8
language models
8
llm responses
8
compared nhs
8
0
7
responses
7
nhs
6
accuracy
5

Similar Publications

Background: This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.

Methods: We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm.

View Article and Find Full Text PDF

Ethnicity and breast cancer incidence in over 329,500 women in England in 2011-2019.

Eur J Surg Oncol

January 2025

Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford.

Introduction: Previous studies have reported an overall lower breast cancer incidence in women from Asian and Black backgrounds compared with white women. Age standardised and age specific incidence rates in the largest specific ethnicities within Asian and Black groups are not reported.

Materials And Methods: Data on population size and the age distribution of women in five ethnic groups of interest (white British, Black African, Black Caribbean, Indian and Pakistani) were extracted from the Office for National Statistics 2001, 2011 and 2021 census data for England.

View Article and Find Full Text PDF

Accelerator neutron sources for BNCT: Current status and some pointers for future development.

Appl Radiat Isot

January 2025

Particle Radiation Oncology Research Center, Institute for Integrated Radiation and Nuclear Science, Kyoto University, 2-Asashiro-Nishi, Kumatori-cho, Sennan-gun, Osaka, 590-0494, Japan.

Recent decades have seen the development of accelerator neutron sources suitable for installation in a hospital setting. Numerous challenges have been faced and solved to deliver technology which continues to transform the field of BNCT. This paper begins by briefly reviewing the technologies which are currently, or soon will be, in clinical use.

View Article and Find Full Text PDF

Background: Atrial fibrillation (AF) is the most common arrhythmia worldwide. Data regarding 30-day readmission following index admission for AF in the developing world are poorly described.

Objectives: The study aimed to assess the rate, predictors, and trends of 30-day readmission after index admission for AF in Syria.

View Article and Find Full Text PDF

Purpose: To quantify the effect of cataract surgery on cornea shape.

Methods: Patients undergoing cataract surgery with standardised 2.75 mm surgical incisions at 110 degrees with a side port at 50 degrees were included.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!