The utility of artificial intelligence platforms for patient-generated questions in Mohs micrographic surgery: a multi-national, blinded expert panel evaluation.

Int J Dermatol

Baylor University Medical Center, Dallas, TX, USA.

Published: November 2024

The study analyzes the use of large language models (LLMs) as educational tools for patients considering Mohs micrographic surgery (MMS), focusing on their effectiveness and accuracy.* -
A panel of 15 MMS surgeons assessed LLM-generated responses to common patient questions, finding that most responses were appropriate and 75% rated as mostly accurate, with ChatGPT scoring the highest in accuracy.* -
While LLM responses were deemed appropriate, only 33% were considered sufficient for clinical use, and their complexity may hinder patient understanding, highlighting the need for dermatologists to recognize these limitations.*

Background: Artificial intelligence (AI) and large language models (LLMs) transform how patients inform themselves. LLMs offer potential as educational tools, but their quality depends upon the information generated. Current literature examining AI as an informational tool in dermatology has been limited in evaluating AI's multifaceted roles and diversity of opinions. Here, we evaluate LLMs as a patient-educational tool for Mohs micrographic surgery (MMS) in and out of the clinic utilizing an international expert panel.

Methods: The most common patient MMS questions were extracted from Google and transposed into two LLMs and Google's search engine. 15 MMS surgeons evaluated the generated responses, examining their appropriateness as a patient-facing informational platform, sufficiency of response in a clinical environment, and accuracy of content generated. Validated scales were employed to assess the comprehensibility of each response.

Results: The majority of reviewers deemed all LLM responses appropriate. 75% of responses were rated as mostly accurate or higher. ChatGPT had the highest mean accuracy. The majority of the panel deemed 33% of responses sufficient for clinical practice. The mean comprehensibility scores for all platforms indicated a required 10th-grade reading level.

Conclusions: LLM-generated responses were rated as appropriate patient informational sources and mostly accurate in their content. However, these platforms may not provide sufficient information to function in a clinical environment, and complex comprehensibility may represent a barrier to utilization. As the popularity of these platforms increases, it is important for dermatologists to be aware of these limitations.

Download full-text PDF	Source
http://dx.doi.org/10.1111/ijd.17382	DOI Listing

Publication Analysis

Top Keywords

artificial intelligence

mohs micrographic

micrographic surgery

clinical environment

responses rated

responses

utility artificial

platforms

intelligence platforms

platforms patient-generated

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!