Performance of ChatGPT, human radiologists, and context-aware ChatGPT in identifying AO codes from radiology reports.

Maximilian F Russe Anna Fink Helen Ngo Hien Tran Fabian Bamberg Marco Reisert Alexander Rau

Sci Rep

Department of Diagnostic and Interventional Radiology, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Breisacher Str. 64, 79106, Freiburg, Germany.

Published: August 2023

While radiologists can describe a fracture's morphology and complexity with ease, the translation into classification systems such as the Arbeitsgemeinschaft Osteosynthesefragen (AO) Fracture and Dislocation Classification Compendium is more challenging. We tested the performance of generic chatbots and chatbots aware of specific knowledge of the AO classification provided by a vector-index and compared it to human readers. In the 100 radiological reports we created based on random AO codes, chatbots provided AO codes significantly faster than humans (mean 3.2 s per case vs. 50 s per case, p < .001) though not reaching human performance (max. chatbot performance of 86% correct full AO codes vs. 95% in human readers). In general, chatbots based on GPT 4 outperformed the ones based on GPT 3.5-Turbo. Further, we found that providing specific knowledge substantially enhances the chatbot's performance and consistency as the context-aware chatbot based on GPT 4 provided 71% consistent correct full AO codes for the compared to the 2% consistent correct full AO codes for the generic ChatGPT 4. This provides evidence, that refining and providing specific context to ChatGPT will be the next essential step in harnessing its power.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468502	PMC
http://dx.doi.org/10.1038/s41598-023-41512-8	DOI Listing

Publication Analysis

Top Keywords

performance chatgpt

chatgpt human

human radiologists

radiologists context-aware

context-aware chatgpt

chatgpt identifying

identifying codes

codes radiology

radiology reports

reports radiologists

Similar Publications

ChatGPT's Attitude, Knowledge, and Clinical Application in Geriatrics Practice and Education: Exploratory Observational Study.

JMIR Form Res

January 2025

Minneapolis VA Health Care System, Minneapolis, MN, United States.

Huai Yong Cheng

Background: The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation of its reliability, particularly in geriatrics.

Objective: This study aimed to evaluate ChatGPT's trustworthiness in geriatrics through 3 distinct approaches: evaluating ChatGPT's geriatrics attitude, knowledge, and clinical application with 2 vignettes of geriatric syndromes (polypharmacy and falls).

Methods: We used the validated University of California, Los Angeles, geriatrics attitude and knowledge instruments to evaluate ChatGPT's geriatrics attitude and knowledge and compare its performance with that of medical students, residents, and geriatrics fellows from reported results in the literature.

View Article and Find Full Text PDF

Similar Publications

Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination.

J Exp Orthop

January 2025

Department of Orthopaedic Surgery Rutgers New Jersey Medical School Newark New Jersey USA.

Dhruv Mendiratta Isabel Herzog Rohan Singh Ashok Para Tej Joshi

Purpose: Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.

View Article and Find Full Text PDF

Similar Publications

M.I.N.I.-KID interviews with adolescents: a corpus-based language analysis of adolescents with depressive disorders and the possibilities of continuation using Chat GPT.

Front Psychiatry

December 2024

Department of Information Science, University of Regensburg, Regensburg, Germany.

Irina Jarvers Angelika Ecker Pia Donabauer Katharina Kampa Maximilian Weißenbacher

Background: Up to 13% of adolescents suffer from depressive disorders. Despite the high psychological burden, adolescents rarely decide to contact child and adolescent psychiatric services. To provide a low-barrier alternative, our long-term goal is to develop a chatbot for early identification of depressive symptoms.

View Article and Find Full Text PDF

Similar Publications

Letter to the Editor: "Comparative analysis of GPT-4-based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors".

Eur Radiol

January 2025

Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, China.

Yang Zhang

View Article and Find Full Text PDF

Similar Publications

Reply to Letter to the Editor: "Comparative analysis of GPT-4 based ChatGPT's diagnostic performance with radiologists using real-world radiology reports of brain tumors".

Eur Radiol

January 2025

Department of Diagnostic and Interventional Radiology, Graduate School of Medicine, Osaka Metropolitan University, Osaka, Japan.

Yasuhito Mitsuyama Hiroyuki Tatekawa Hirotaka Takita Shannon L Walston Yukio Miki

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!