Objectives: Artificial intelligence chatbots have demonstrated feasibility and efficacy in improving health outcomes. In this study, responses from 5 different publicly available AI chatbots-Bing, GPT-3.5, GPT-4, Google Bard, and Claude-to frequently asked questions related to oral cancer were evaluated.
Study Design: Relevant patient-related frequently asked questions about oral cancer were obtained from two main sources: public health websites and social media platforms. From these sources, 20 oral cancer-related questions were selected. Four board-certified specialists in oral medicine/oral and maxillofacial pathology assessed the answers using modified version of the global quality score on a 5-point Likert scale. Additionally, readability was measured using the Flesch-Kincaid Grade Level and Flesch Reading Ease scores. Responses were also assessed for empathy using a validated 5-point scale.
Results: Specialists ranked GPT-4 with highest total score of 17.3 ± 1.5, while Bing received the lowest at 14.9 ± 2.2. Bard had the highest Flesch Reading Ease score of 62 ± 7; and ChatGPT-3.5 and Claude received the lowest scores (more challenging readability). GPT-4 and Bard emerged as the most superior chatbots in terms of empathy and accurate citations on patient-related frequently asked questions pertaining to oral cancer. GPT-4 had highest overall quality, whereas Bing showed the lowest level of quality, empathy, and accuracy for citations.
Conclusion: GPT-4 demonstrated the highest quality responses to frequently asked questions pertaining to oral cancer. Although impressive in their ability to guide patients on common oral cancer topics, most chatbots did not perform well when assessed for empathy or citation accuracy.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.oooo.2024.12.028 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!