A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

How well do large language model-based chatbots perform in oral and maxillofacial radiology? | LitMetric

How well do large language model-based chatbots perform in oral and maxillofacial radiology?

Dentomaxillofac Radiol

Department of Oral and Maxillofacial Radiology, Yonsei University College of Dentistry, Seoul 03722, Republic of Korea.

Published: September 2024

Objectives: This study evaluated the performance of four large language model (LLM)-based chatbots by comparing their test results with those of dental students on an oral and maxillofacial radiology examination.

Methods: ChatGPT, ChatGPT Plus, Bard, and Bing Chat were tested on 52 questions from regular dental college examinations. These questions were categorized into three educational content areas: basic knowledge, imaging and equipment, and image interpretation. They were also classified as multiple-choice questions (MCQs) and short-answer questions (SAQs). The accuracy rates of the chatbots were compared with the performance of students, and further analysis was conducted based on the educational content and question type.

Results: The students' overall accuracy rate was 81.2%, while that of the chatbots varied: 50.0% for ChatGPT, 65.4% for ChatGPT Plus, 50.0% for Bard, and 63.5% for Bing Chat. ChatGPT Plus achieved a higher accuracy rate for basic knowledge than the students (93.8% vs. 78.7%). However, all chatbots performed poorly in image interpretation, with accuracy rates below 35.0%. All chatbots scored less than 60.0% on MCQs, but performed better on SAQs.

Conclusions: The performance of chatbots in oral and maxillofacial radiology was unsatisfactory. Further training using specific, relevant data derived solely from reliable sources is required. Additionally, the validity of these chatbots' responses must be meticulously verified.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11358622PMC
http://dx.doi.org/10.1093/dmfr/twae021DOI Listing

Publication Analysis

Top Keywords

oral maxillofacial
12
large language
8
maxillofacial radiology
8
bing chat
8
educational content
8
basic knowledge
8
image interpretation
8
accuracy rates
8
accuracy rate
8
chatbots
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!