An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support.

Chedva Levin Tehilla Kagan Shani Rosen Mor Saban

Int J Nurs Stud

Department of Nursing, School of Health Professions, Faculty of Medical and Health Sciences, Tel Aviv University, Israel. Electronic address:

Published: July 2024

Aim: To assess the clinical reasoning capabilities of two large language models, ChatGPT-4 and Claude-2.0, compared to those of neonatal nurses during neonatal care scenarios.

Design: A cross-sectional study with a comparative evaluation using a survey instrument that included six neonatal intensive care unit clinical scenarios.

Participants: 32 neonatal intensive care nurses with 5-10 years of experience working in the neonatal intensive care units of three medical centers.

Methods: Participants responded to 6 written clinical scenarios. Simultaneously, we asked ChatGPT-4 and Claude-2.0 to provide initial assessments and treatment recommendations for the same scenarios. The responses from ChatGPT-4 and Claude-2.0 were then scored by certified neonatal nurse practitioners for accuracy, completeness, and response time.

Results: Both models demonstrated capabilities in clinical reasoning for neonatal care, with Claude-2.0 significantly outperforming ChatGPT-4 in clinical accuracy and speed. However, limitations were identified across the cases in diagnostic precision, treatment specificity, and response lag.

Conclusions: While showing promise, current limitations reinforce the need for deep refinement before ChatGPT-4 and Claude-2.0 can be considered for integration into clinical practice. Additional validation of these tools is important to safely leverage this Artificial Intelligence technology for enhancing clinical decision-making.

Impact: The study provides an understanding of the reasoning accuracy of new Artificial Intelligence models in neonatal clinical care. The current accuracy gaps of ChatGPT-4 and Claude-2.0 need to be addressed prior to clinical usage.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.ijnurstu.2024.104771	DOI Listing

Publication Analysis

Top Keywords

chatgpt-4 claude-20

neonatal intensive

intensive care

clinical

neonatal

language models

neonatal clinical

clinical reasoning

neonatal care

artificial intelligence

Similar Publications

Development and Evaluation of a Mental Health Chatbot Using ChatGPT 4.0: Mixed Methods User Experience Study With Korean Users.

JMIR Med Inform

January 2025

Sungkyunkwan University, Seoul, Republic of Korea.

Boyoung Kang Munpyo Hong

Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence-driven solutions.

Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr.

View Article and Find Full Text PDF

Similar Publications

Assessing ChatGPT-4's Capabilities in Generating Dermatology Board Examination Content: An Explorational Study.

Acta Derm Venereol

January 2025

Department of Dermatology, Rambam Health Care Campus, Haifa, Israel; Technion Faculty of Medicine, Haifa, Israel.

Jonathan Shapiro Anna Lyakhovitsky Tamar Freud Felix Pavlotsky Ziyad Khamaysi

View Article and Find Full Text PDF

Similar Publications

Feasibility of large language models for CEUS LI-RADS categorization of small liver nodules in patients at risk for hepatocellular carcinoma.

Front Oncol

December 2024

West China Hospital of Sichuan University, Chengdu, China.

Jiayan Huang Rui Yang Xiaotong Huang Keyu Zeng Yan Liu

Background: Large language models (LLMs) offer opportunities to enhance radiological applications, but their performance in handling complex tasks remains insufficiently investigated.

Purpose: To evaluate the performance of LLMs integrated with Contrast-enhanced Ultrasound Liver Imaging Reporting and Data System (CEUS LI-RADS) in diagnosing small (≤20mm) hepatocellular carcinoma (sHCC) in high-risk patients.

Materials And Methods: From November 2014 to December 2023, high-risk HCC patients with untreated small (≤20mm) focal liver lesions (sFLLs), were included in this retrospective study.

View Article and Find Full Text PDF

Similar Publications

Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery.

Turk J Ophthalmol

December 2024

University of Health Sciences Türkiye, Başakşehir Çam and Sakura City Hospital, Clinic of Ophthalmology, İstanbul, Türkiye.

Fahri Onur Aydın Burakhan Kürşat Aksoy Ali Ceylan Yusuf Berk Akbaş Serhat Ermiş

Objectives: To assess the appropriateness and readability of large language model (LLM) chatbots' answers to frequently asked questions about refractive surgery.

Materials And Methods: Four commonly used LLM chatbots were asked 40 questions frequently asked by patients about refractive surgery. The appropriateness of the answers was evaluated by 2 experienced refractive surgeons.

View Article and Find Full Text PDF

Similar Publications

The In-depth Comparative Analysis of Four Large Language AI Models for Risk Assessment and Information Retrieval from Multi-Modality Prostate Cancer Work-up Reports.

World J Mens Health

December 2024

Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei, Taiwan.

Lun-Hsiang Yuan Shi-Wei Huang Dean Chou Chung-You Tsai

Purpose: Information retrieval (IR) and risk assessment (RA) from multi-modality imaging and pathology reports are critical to prostate cancer (PC) treatment. This study aims to evaluate the performance of four general-purpose large language model (LLMs) in IR and RA tasks.

Materials And Methods: We conducted a study using simulated text reports from computed tomography, magnetic resonance imaging, bone scans, and biopsy pathology on stage IV PC patients.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!