Objective: This study investigated the response and readability of generative artificial intelligence (AI) models to questions and recommendations proposed by the 2023 Congress of Neurological Surgeons (CNS) guidelines for Chiari 1 malformation.

Methods: Thirteen questions were generated from CNS guidelines and asked to Perplexity, ChatGPT 4o, Microsoft Copilot, and Google Gemini. AI answers were divided into two categories, "concordant" and "non-concordant," according to their alignment with current CNS guidelines. Non-concordant answers were sub-categorized as "insufficient" or "over-conclusive." Responses were evaluated for readability via the Flesch-Kincaid Grade Level, Gunning Fog Index, SMOG (Simple Measure of Gobbledygook) Index, and Flesch Reading Ease test.

Results: Perplexity displayed the highest concordance rate of 69.2 %, with non-concordant responses classified as 0 % insufficient and 30.8 % over-conclusive. ChatGPT 4o had the lowest concordance rate at 23.1 %, with 0 % insufficient and 76.9 % over-conclusive classifications. Copilot showed a 61.5 % concordance rate, with 7.7 % insufficient and 30.8 % over-conclusive. Gemini demonstrated a 30.8 % concordance rate, with 7.7 % insufficient and 61.5 % as over-conclusive. Flesch-Kincaid Grade Level scores ranged from 14.48 (Gemini) to 16.48 (Copilot), Gunning Fog Index scores varied between 16.18 (Gemini) and 18.8 (Copilot), SMOG Index scores ranged from 16 (Gemini) to 17.54 (Copilot), and Flesch Reading Ease scores were low across all models, with Gemini showing the highest mean score of 21.3.

Conclusion: Perplexity and Copilot emerged as the best-performing for concordance, while ChatGPT and Gemini displayed the highest over-conclusive rates. All responses showcased high complexity and difficult readability. While AI can be valuable in certain aspects of clinical practice, the low concordance rates show that AI should not replace clinician judgement.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.clineuro.2024.108662DOI Listing

Publication Analysis

Top Keywords

cns guidelines
16
concordance rate
16
artificial intelligence
8
guidelines chiari
8
flesch-kincaid grade
8
grade level
8
gunning fog
8
flesch reading
8
reading ease
8
displayed highest
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!