Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.

J Cardiothorac Vasc Anesth

Department of Anesthesia, Critical Care, and Pain Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School Boston, MA. Electronic address:

Published: May 2024

New artificial intelligence tools have been developed that have implications for medical usage. Large language models (LLMs), such as the widely used ChatGPT developed by OpenAI, have not been explored in the context of anesthesiology education. Understanding the reliability of various publicly available LLMs for medical specialties could offer insight into their understanding of the physiology, pharmacology, and practical applications of anesthesiology. An exploratory prospective review was conducted using 3 commercially available LLMs--OpenAI's ChatGPT GPT-3.5 version (GPT-3.5), OpenAI's ChatGPT GPT-4 (GPT-4), and Google's Bard--on questions from a widely used anesthesia board examination review book. Of the 884 eligible questions, the overall correct answer rates were 47.9% for GPT-3.5, 69.4% for GPT-4, and 45.2% for Bard. GPT-4 exhibited significantly higher performance than both GPT-3.5 and Bard (p = 0.001 and p < 0.001, respectively). None of the LLMs met the criteria required to secure American Board of Anesthesiology certification, according to the 70% passing score approximation. GPT-4 significantly outperformed GPT-3.5 and Bard in terms of overall performance, but lacked consistency in providing explanations that aligned with scientific and medical consensus. Although GPT-4 shows promise, current LLMs are not sufficiently advanced to answer anesthesiology board examination questions with passing success. Further iterations and domain-specific training may enhance their utility in medical education.

Download full-text PDF

Source
http://dx.doi.org/10.1053/j.jvca.2024.01.032DOI Listing

Publication Analysis

Top Keywords

artificial intelligence
8
examination questions
8
large language
8
language models
8
board examination
8
gpt-35 bard
8
gpt-4
6
anesthesiology
5
gpt-35
5
intelligence anesthesiology
4

Similar Publications

Background: Acute kidney injury (AKI) is a common complication in hospitalized older patients, associated with increased morbidity, mortality, and health care costs. Major adverse kidney events within 30 days (MAKE30), a composite of death, new renal replacement therapy, or persistent renal dysfunction, has been recommended as a patient-centered endpoint for clinical trials involving AKI.

Objective: This study aimed to develop and validate a machine learning-based model to predict MAKE30 in hospitalized older patients with AKI.

View Article and Find Full Text PDF

Background: Mental health chatbots have emerged as a promising tool for providing accessible and convenient support to individuals in need. Building on our previous research on digital interventions for loneliness and depression among Korean college students, this study addresses the limitations identified and explores more advanced artificial intelligence-driven solutions.

Objective: This study aimed to develop and evaluate the performance of HoMemeTown Dr.

View Article and Find Full Text PDF

Purpose: Immune checkpoint inhibitors (ICIs) have demonstrated promise in the treatment of various cancers. Single-drug ICI therapy (immuno-oncology [IO] monotherapy) that targets PD-L1 is the standard of care in patients with advanced non-small cell lung cancer (NSCLC) with PD-L1 expression ≥50%. We sought to find out if a machine learning (ML) algorithm can perform better as a predictive biomarker than PD-L1 alone.

View Article and Find Full Text PDF

Object detection in motion management scenarios based on deep learning.

PLoS One

January 2025

School of Physical Education, Jinjiang College, Sichuan University, Chengdu, Sichuan Province, People's Republic of China.

In athletes' competitions and daily training, in order to further strengthen the athletes' sports level, it is usually necessary to analyze the athletes' sports actions at a specific moment, in which it is especially important to quickly and accurately identify the categories and positions of the athletes, sports equipment, field boundaries and other targets in the sports scene. However, the existing detection methods failed to achieve better detection results, and the analysis found that the reasons for this phenomenon mainly lie in the loss of temporal information, multi-targeting, target overlap, and coupling of regression and classification tasks, which makes it more difficult for these network models to adapt to the detection task in this scenario. Based on this, we propose for the first time a supervised object detection method for scenarios in the field of motion management.

View Article and Find Full Text PDF

One way to treat diabetes mellitus type II is by using α-glucosidase inhibitor, that will slow down the postprandial glucose intake. Metabolomics analysis of Artabotrys sumatranus leaf extract was used in this research to predict the active compounds as α-glucosidase inhibitors from this extract. Both multivariate statistical analysis and machine learning approaches were used to improve the confidence of the predictions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!