A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Evaluation of Large language model performance on the Multi-Specialty Recruitment Assessment (MSRA) exam. | LitMetric

Introduction: AI-powered platforms have gained prominence in medical education and training, offering diverse applications from surgical performance assessment to exam preparation. This research paper examines the capabilities of Large Language Models (LLMs), including Llama 2, Google Bard, Bing Chat, and ChatGPT-3.5, in answering multiple-choice questions of the Clinical Problem Solving (CPS) paper of the Multi-Specialty Recruitment Assessment (MSRA) exam.

Methods: Using a dataset of 100 CPS questions from ten subject categories, we assessed the LLMs' performance against medical doctors preparing for the exam.

Results: Results showed that Bing Chat outperformed all other LLMs and even surpassed human users from the Qbank question bank. Conversely, Llama 2's performance was inferior to human users. Google Bard and ChatGPT 3.5 did not exhibit statistically significant differences in correct response rates compared to human candidates. Pairwise comparisons demonstrated Bing Chat's significant superiority over Llama 2, Google Bard, and ChatGPT 3.5. However, no significant differences were found between Llama 2 and Google Bard, Llama 2, and ChatGPT-3.5, and Google Bard and ChatGPT-3.5.

Discussion: Freely available LLMs have already demonstrated that they can perform as well or even outperform human users in answering MSRA exam questions. Bing Chat emerged as a particularly strong performer. The study also highlights the potential for enhancing LLMs' medical knowledge acquisition through tailored fine-tuning. Medical knowledge tailored LLMs such as Med-PaLM, have already shown promising results.

Conclusion: We provided valuable insights into LLMs' competence in answering medical MCQs and their potential integration into medical education and assessment processes.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiomed.2023.107794DOI Listing

Publication Analysis

Top Keywords

google bard
20
llama google
12
bing chat
12
human users
12
large language
8
multi-specialty recruitment
8
recruitment assessment
8
assessment msra
8
msra exam
8
medical education
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!