Efficacy and empathy of AI chatbots in answering frequently asked questions on oral oncology.

Rata Rokhshad Zaid H Khoury Hossein Mohammad-Rahimi Parisa Motie Jeffery B Price Tiffany Tavares Maryam Jessri Roxanne Bavarian James J Sciubba Ahmed S Sultan

Oral Surg Oral Med Oral Pathol Oral Radiol

Division of Artificial Intelligence Research, Department of Oncology and Diagnostic Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA; University of Maryland Marlene and Stewart Greenebaum Comprehensive Cancer Center, Baltimore, MD, USA. Electronic address:

Published: January 2025

Objectives: Artificial intelligence chatbots have demonstrated feasibility and efficacy in improving health outcomes. In this study, responses from 5 different publicly available AI chatbots-Bing, GPT-3.5, GPT-4, Google Bard, and Claude-to frequently asked questions related to oral cancer were evaluated.

Study Design: Relevant patient-related frequently asked questions about oral cancer were obtained from two main sources: public health websites and social media platforms. From these sources, 20 oral cancer-related questions were selected. Four board-certified specialists in oral medicine/oral and maxillofacial pathology assessed the answers using modified version of the global quality score on a 5-point Likert scale. Additionally, readability was measured using the Flesch-Kincaid Grade Level and Flesch Reading Ease scores. Responses were also assessed for empathy using a validated 5-point scale.

Results: Specialists ranked GPT-4 with highest total score of 17.3 ± 1.5, while Bing received the lowest at 14.9 ± 2.2. Bard had the highest Flesch Reading Ease score of 62 ± 7; and ChatGPT-3.5 and Claude received the lowest scores (more challenging readability). GPT-4 and Bard emerged as the most superior chatbots in terms of empathy and accurate citations on patient-related frequently asked questions pertaining to oral cancer. GPT-4 had highest overall quality, whereas Bing showed the lowest level of quality, empathy, and accuracy for citations.

Conclusion: GPT-4 demonstrated the highest quality responses to frequently asked questions pertaining to oral cancer. Although impressive in their ability to guide patients on common oral cancer topics, most chatbots did not perform well when assessed for empathy or citation accuracy.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.oooo.2024.12.028	DOI Listing

Publication Analysis

Top Keywords

frequently asked

asked questions

oral cancer

questions oral

oral

patient-related frequently

flesch reading

reading ease

assessed empathy

gpt-4 highest

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!