Evaluating the Appropriateness, Consistency, and Readability of ChatGPT in Critical Care Recommendations.

Kaan Y Balta Arshia P Javidan Eric Walser Robert Arntfield Ross Prager

J Intensive Care Med

Division of Critical Care, London Health Sciences Centre, Western University, London, Ontario, Canada.

Published: February 2025

We assessed 2 versions of the large language model (LLM) ChatGPT-versions 3.5 and 4.0-in generating appropriate, consistent, and readable recommendations on core critical care topics. How do successive large language models compare in terms of generating appropriate, consistent, and readable recommendations on core critical care topics? A set of 50 LLM-generated responses to clinical questions were evaluated by 2 independent intensivists based on a 5-point Likert scale for appropriateness, consistency, and readability. ChatGPT 4.0 showed significantly higher median appropriateness scores compared to ChatGPT 3.5 (4.0 vs 3.0, < .001). However, there was no significant difference in consistency between the 2 versions (40% vs 28%, = 0.291). Readability, assessed by the Flesch-Kincaid Grade Level, was also not significantly different between the 2 models (14.3 vs 14.4, = 0.93). Both models produced "hallucinations"-misinformation delivered with high confidence-which highlights the risk of relying on these tools without domain expertise. Despite potential for clinical application, both models lacked consistency producing different results when asked the same question multiple times. The study underscores the need for clinicians to understand the strengths and limitations of LLMs for safe and effective implementation in critical care settings. https://osf.io/8chj7/.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639400	PMC
http://dx.doi.org/10.1177/08850666241267871	DOI Listing

Publication Analysis

Top Keywords

critical care

appropriateness consistency

consistency readability

readability chatgpt

large language

generating appropriate

appropriate consistent

consistent readable

readable recommendations

recommendations core

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!