Artificial intelligence (AI) has emerged as a transformative tool in education, particularly in specialized fields such as dentistry. This study evaluated the performance of four advanced AI models - ChatGPT-4o (San Francisco, CA: OpenAI), ChatGPT-o1, Gemini 1.5 Pro (Mountain View, CA: Google LLC), and Gemini 2.0 Advanced, in the Turkish Dental Specialty Examination (DUS) for 2020 and 2021. A total of 240 questions, comprising 120 questions per year from basic and clinical sciences, were analyzed. AI models were assessed based on their accuracy in providing correct answers compared to the official answer keys. For the 2020 DUS, ChatGPT-o1 and Gemini 2.0 Advanced achieved the highest accuracy rates of 93.70% and 96.80%, respectively, with net scores of 112.50 and 115 out of 120 questions. ChatGPT-4o and Gemini 1.5 Pro followed with accuracy rates of 83.33% and 85.40%. For the 2021 DUS, ChatGPT-o1 again demonstrated the highest accuracy at 97.88% (115.50 net score), closely followed by Gemini 2.0 Advanced at 96.82% (114.25 net score). Overall, ChatGPT-4o and Gemini 1.5 Pro scored lower for 2021, achieving accuracy rates of 88.35% and 93.64%, respectively. Combining results from both years (238 total questions), ChatGPT-o1 and Gemini 2.0 Advanced achieved accuracy rates of 97.46% (230 correct answers, 95% CI: 94.62%, 100.00%) and 97.90% (231 correct answers, 95% CI: 94.62%, 100.00%), respectively, significantly outperforming ChatGPT-4o (88.66%, 211 correct answers, 95% CI: 85.43%, 91.89%) and Gemini 1.5 Pro (91.60%, 218 correct answers, 95% CI: 87.75%, 95.45%). Statistical analysis revealed significant differences among the models (p = 0.0002). Pairwise comparisons demonstrated that ChatGPT-4o underperformed significantly compared to ChatGPT-o1 (p = 0.0016) and Gemini 2.0 Advanced (p = 0.0007) after Bonferroni correction. The consistently high accuracy rates and narrow confidence intervals for the top-performing models underscore their superior reliability and performance in answering the DUS questions. Generative AI modules such as ChatGPT-01 and Gemini 2.0 have the potential to enhance dental board exam preparation through question evaluation. While the AI modules appear to outperform humans on DUS questions, the study raises a concern about the ethical uses of AI and the true justification and value of DUS examinations as dental competency examinations. A higher level of knowledge evaluation should be considered. This research contributes to the growing body of literature on AI applications in specialized knowledge domains and provides a foundation for further exploration of its integration into dental education.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724709 | PMC |
http://dx.doi.org/10.7759/cureus.77292 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!