Background: Conversational chatbots, fueled by large language models, spark debate over their potential in education and medical career exams. There is debate in the literature about the scientific integrity of the outputs produced by these chatbots.

Aims: This study evaluates ChatGPT 3.5 and Perplexity AI's cross-sectional performance in responding to questions from the 2023 Italian national residency admission exam (SSM23), comparing results and chatbots' concordance with previous years SSMs.

Methods: Gastroenterology-related SSM23 questions were input into ChatGPT 3.5 and Perplexity AI, evaluating their performance in correct responses and total scores. This process was repeated with questions from the three preceding years. Additionally, chatbot concordance was assessed using Cohen's method.

Results: In SSM23, ChatGPT 3.5 outperforms Perplexity AI with 94.11% correct responses, demonstrating consistency across years. Concordance weakened in 2023 (κ=0.203, P = 0.148), but ChatGPT consistently maintains a high standard compared to Perplexity AI.

Conclusion: ChatGPT 3.5 and Perplexity AI exhibit promise in addressing gastroenterological queries, emphasizing potential educational roles. However, their variable performance mandates cautious use as supplementary tools alongside conventional study methods. Clear guidelines are crucial for educators to balance traditional approaches and innovative systems, enhancing educational standards.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.dld.2024.02.019DOI Listing

Publication Analysis

Top Keywords

chatgpt perplexity
16
correct responses
8
chatgpt
6
perplexity
6
charting education
4
education gastroenterology
4
gastroenterology cross-sectional
4
cross-sectional evaluation
4
evaluation chatgpt
4
perplexity medical
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!