Enhancing Diagnostic Support for Chiari Malformation and Syringomyelia: A Comparative Study of Contextualized ChatGPT Models.

Ethan D L Brown Max Ward Apratim Maity Mark A Mittler Sheng-Fu Larry Lo Randy S D'Amico

World Neurosurg

Department of Neurologic Surgery, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, New York, USA.

Published: September 2024

This study assesses how different contextualization methods impact ChatGPT's ability to provide medical recommendations for Chiari Malformation and Syringomyelia.
Contextualized versions of GPT-4 showed significantly improved agreement with expert consensus statements compared to the standard GPT-4 model, indicating that they offered more valid medical advice.
Results also highlighted increased readability and reduced word count in the contextualized models, suggesting better communication of complex medical information.

Objectives: The rapidly increasing adoption of large language models in medicine has drawn attention to potential applications within the field of neurosurgery. This study evaluates the effects of various contextualization methods on ChatGPT's ability to provide expert-consensus aligned recommendations on the diagnosis and management of Chiari Malformation and Syringomyelia.

Methods: Native GPT4 and GPT4 models contextualized using various strategies were asked questions revised from the 2022 Chiari and Syringomyelia Consortium International Consensus Document. ChatGPT-provided responses were then compared to consensus statements using reviewer assessments of 1) responding to the prompt, 2) agreement of ChatGPT response with consensus statements, 3) recommendation to consult with a medical professional, and 4) presence of supplementary information. Flesch-Kincaid, SMOG, word count, and Gunning-Fog readability scores were calculated for each model using the quanteda package in R.

Results: Relative to GPT4, all contextualized GPTs demonstrated increased agreement with consensus statements. PDF+Prompting and Prompting models provided the most elevated agreement scores of 19 of 24 and 23 of 24, respectively, versus 9 of 24 for GPT4 (p=.021, p=.001). A trend toward improved readability was observed when comparing contextualized models at large to ChatGPT4, with significant decreases in average word count (180.7 vs 382.3, p<.001) and Flesch-Kincaid Reading Ease score (11.7 vs 17.2, p=.033).

Conclusions: The enhanced performance observed in response to ChatGPT4 contextualization suggests broader applications of large language models in neurosurgery than what the current literature indicates. This study provides proof of concept for the use of contextualized GPT models in neurosurgical contexts and showcases the easy accessibility of improved model performance.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.wneu.2024.05.172	DOI Listing

Publication Analysis

Top Keywords

consensus statements

chiari malformation

word count

models

enhancing diagnostic

diagnostic support

support chiari

malformation syringomyelia

syringomyelia comparative

comparative study

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered