Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis.

Mingxin Liu Tsuyoshi Okuhara Wenbo Huang Atsushi Ogihara Hikari Sophia Nagao Hiroko Okada Takahiro Kiuchi

Int Dent J

Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan.

Published: November 2024

Introduction And Aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics.

Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs.

Results: Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4's performance, however, remained consistent across different regions.

Conclusion: LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs' accuracy. The reliance on image-based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis.

Download full-text PDF	Source
http://dx.doi.org/10.1016/j.identj.2024.10.014	DOI Listing

Publication Analysis

Top Keywords

dental licensing

licensing examinations

dental education

large language

language models

dental

evaluate performance

accuracy

licensing

models dental

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!