Validity of ChatGPT-generated musculoskeletal images.

Skeletal Radiol

Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Bristol Road South, Northfield, Birmingham, UK.

Published: August 2024

AI Article Synopsis

  • The study examines the potential of ChatGPT4, a Large Language Model, to automate the creation of anatomical illustrations for musculoskeletal radiology research.
  • Evaluation of 24 generated images showed significant shortcomings, particularly in anatomical accuracy, annotation correctness, and usability for research papers.
  • Despite the promise of LLMs for quick figure generation, ChatGPT4's current output does not meet the high standards required for musculoskeletal research, highlighting the need for improvements in future iterations.

Article Abstract

Objective: In the evolving landscape of medical research and radiology, effective communication of intricate ideas is imperative, with visualizations playing a crucial role. This study explores the transformative potential of ChatGPT4, a powerful Large Language Model (LLM), in automating the creation of schematics and figures for radiology research papers, specifically focusing on its implications for musculoskeletal studies.

Materials And Methods: Deploying ChatGPT4, the study aimed to assess the model's ability to generate anatomical images of six large joints-shoulder, elbow, wrist, hip, knee, and ankle. Four variations of a text prompt were utilized, to generate a coronal illustration with annotations for each joint. Evaluation parameters included anatomical correctness, correctness of annotations, aesthetic nature of illustrations, usability of figures in research papers, and cost-effectiveness. Four panellists performed the assessment using a 5-point Likert Scale.

Results: Overall analysis of the 24 illustrations encompassing the six joints of interest (4 of each) revealed significant limitations in ChatGPT4's performance. The anatomical design ranged from poor to good, all of the illustrations received a below-average rating for annotation, with the majority assessed as poor. All of them ranked below average for usability in research papers. There was good agreement between raters across all domains (ICC = 0.61).

Conclusion: While LLMs like ChatGPT4 present promising prospects for rapid figure generation, their current capabilities fall short of meeting the rigorous standards demanded by musculoskeletal radiology research. Future developments should focus on iterative refinement processes to enhance the realism of LLM-generated musculoskeletal schematics.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s00256-024-04638-yDOI Listing

Publication Analysis

Top Keywords

validity chatgpt-generated
4
musculoskeletal
4
chatgpt-generated musculoskeletal
4
musculoskeletal images
4
images objective
4
objective evolving
4
evolving landscape
4
landscape medical
4
medical radiology
4
radiology effective
4

Similar Publications

Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software.

J Am Acad Orthop Surg

January 2025

From the Department of Orthopaedic Surgery, University Hospitals of Cleveland, Case Western Reserve University, Cleveland, OH (Porto, Morgan, Hecht, Burkhart, and Liu), and the Case Western Reserve University School of Medicine, Cleveland, OH (Porto, Morgan, and Hecht).

Introduction: The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?

Methods: PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors.

View Article and Find Full Text PDF

Objective: In this study the authors assessed the ability of Chat Generative Pretrained Transformer (ChatGPT) 3.5 and ChatGPT4 to generate readable and accurate summaries of published neurosurgical literature.

Methods: Abstracts published in journal issues released between June 2023 and August 2023 (n = 150) were randomly selected from the top 5 ranked neurosurgical journals according to Google Scholar.

View Article and Find Full Text PDF

Purpose: To compare the accuracy and readability of responses to oculoplastics patient questions provided by Google and ChatGPT. Additionally, to assess the ability of ChatGPT to create customized patient education materials.

Methods: We executed a Google search to identify the 3 most frequently asked patient questions (FAQs) related to 10 oculoplastics conditions.

View Article and Find Full Text PDF

Detection of ChatGPT fake science with the xFakeSci learning algorithm.

Sci Rep

July 2024

Hefei University of Technology, Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), Hefei, 230009, China.

Article Synopsis
  • - Generative AI tools like ChatGPT are being increasingly used to create articles, prompting this study to explore the unique characteristics of AI-generated content compared to scientific publications.
  • - The research involves creating articles on various diseases using prompt engineering and developing a new algorithm, xFakeSci, which can differentiate between AI-generated and authentic scientific articles through a rigorous training process.
  • - xFakeSci outperformed traditional data mining algorithms in accuracy, achieving F1 scores of 80 to 94%, thanks to its innovative calibration methods and proximity distance heuristics, highlighting its effectiveness in identifying fake science.
View Article and Find Full Text PDF

ChatGPT's role in creating multiple-choice questions (MCQs) is growing but the validity of these artificial-intelligence-generated questions is unclear. This literature review was conducted to address the urgent need for understanding the application of ChatGPT in generating MCQs for medical education. Following the database search and screening of 1920 studies, we found 23 relevant studies.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!