AI Article Synopsis

  • This study assessed how well three large language models (LLMs) - ChatGPT-3.5, ChatGPT-4, and Google Bard - could create and improve patient education materials (PEMs) for pediatric cataract.
  • Researchers compared the responses from the LLMs based on prompts that varied in complexity and readability levels, specifically targeting sixth-grade understanding.
  • Results showed that all LLMs produced high-quality and accurate materials, with ChatGPT-4 standing out for generating the most readable PEMs, effectively lowering the complexity to meet the specified reading level.

Article Abstract

Background/aims: This was a cross-sectional comparative study. We evaluated the ability of three large language models (LLMs) (ChatGPT-3.5, ChatGPT-4, and Google Bard) to generate novel patient education materials (PEMs) and improve the readability of existing PEMs on paediatric cataract.

Methods: We compared LLMs' responses to three prompts. Prompt A requested they write a handout on paediatric cataract that was 'easily understandable by an average American.' Prompt B modified prompt A and requested the handout be written at a 'sixth-grade reading level, using the Simple Measure of Gobbledygook (SMOG) readability formula.' Prompt C rewrote existing PEMs on paediatric cataract 'to a sixth-grade reading level using the SMOG readability formula'. Responses were compared on their quality (DISCERN; 1 (low quality) to 5 (high quality)), understandability and actionability (Patient Education Materials Assessment Tool (≥70%: understandable, ≥70%: actionable)), accuracy (Likert misinformation; 1 (no misinformation) to 5 (high misinformation) and readability (SMOG, Flesch-Kincaid Grade Level (FKGL); grade level <7: highly readable).

Results: All LLM-generated responses were of high-quality (median DISCERN ≥4), understandability (≥70%), and accuracy (Likert=1). All LLM-generated responses were not actionable (<70%). ChatGPT-3.5 and ChatGPT-4 prompt B responses were more readable than prompt A responses (p<0.001). ChatGPT-4 generated more readable responses (lower SMOG and FKGL scores; 5.59±0.5 and 4.31±0.7, respectively) than the other two LLMs (p<0.001) and consistently rewrote them to or below the specified sixth-grade reading level (SMOG: 5.14±0.3).

Conclusion: LLMs, particularly ChatGPT-4, proved valuable in generating high-quality, readable, accurate PEMs and in improving the readability of existing materials on paediatric cataract.

Download full-text PDF

Source
http://dx.doi.org/10.1136/bjo-2024-325252DOI Listing

Publication Analysis

Top Keywords

paediatric cataract
12
patient education
12
large language
8
language models
8
education materials
8
existing pems
8
pems paediatric
8
prompt requested
8
reading level
8
smog readability
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!