Background: Patients are increasingly using artificial intelligence (AI) chatbots to seek answers to medical queries.

Methods: Ten frequently asked questions in anaesthesia were posed to three AI chatbots: ChatGPT4 (OpenAI), Bard (Google), and Bing Chat (Microsoft). Each chatbot's answers were evaluated in a randomised, blinded order by five residency programme directors from 15 medical institutions in the USA. Three medical content quality categories (accuracy, comprehensiveness, safety) and three communication quality categories (understandability, empathy/respect, and ethics) were scored between 1 and 5 (1 representing worst, 5 representing best).

Results: ChatGPT4 and Bard outperformed Bing Chat (median [inter-quartile range] scores: 4 [3-4], 4 [3-4], and 3 [2-4], respectively; <0.001 with all metrics combined). All AI chatbots performed poorly in accuracy (score of ≥4 by 58%, 48%, and 36% of experts for ChatGPT4, Bard, and Bing Chat, respectively), comprehensiveness (score ≥4 by 42%, 30%, and 12% of experts for ChatGPT4, Bard, and Bing Chat, respectively), and safety (score ≥4 by 50%, 40%, and 28% of experts for ChatGPT4, Bard, and Bing Chat, respectively). Notably, answers from ChatGPT4, Bard, and Bing Chat differed statistically in comprehensiveness (ChatGPT4, 3 [2-4] Bing Chat, 2 [2-3], <0.001; and Bard 3 [2-4] Bing Chat, 2 [2-3], =0.002). All large language model chatbots performed well with no statistical difference for understandability (=0.24), empathy (=0.032), and ethics (=0.465).

Conclusions: In answering anaesthesia patient frequently asked questions, the chatbots perform well on communication metrics but are suboptimal for medical content metrics. Overall, ChatGPT4 and Bard were comparable to each other, both outperforming Bing Chat.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11099318PMC
http://dx.doi.org/10.1016/j.bjao.2024.100280DOI Listing

Publication Analysis

Top Keywords

artificial intelligence
8
frequently asked
8
asked questions
8
questions anaesthesia
8
bing chat
8
quality categories
8
comparison artificial
4
intelligence large
4
large language
4
language model
4

Similar Publications

Deep Equilibrium Unfolding Learning for Noise Estimation and Removal in Optical Molecular Imaging.

Comput Med Imaging Graph

January 2025

CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; National Key Laboratory of Kidney Diseases, Beijing 100853, China. Electronic address:

In clinical optical molecular imaging, the need for real-time high frame rates and low excitation doses to ensure patient safety inherently increases susceptibility to detection noise. Faced with the challenge of image degradation caused by severe noise, image denoising is essential for mitigating the trade-off between acquisition cost and image quality. However, prevailing deep learning methods exhibit uncontrollable and suboptimal performance with limited interpretability, primarily due to neglecting underlying physical model and frequency information.

View Article and Find Full Text PDF

Objectives: Contrast agents are frequently administered in computed tomography (CT) scans used for opportunistic screening of osteoporosis. The objective of this study is to compare the impact of contrast-related bone mineral density (BMD) increase between phantom-based and internal CT calibration techniques.

Materials And Methods: Phantom-based and internal CT calibration techniques were used to determine trabecular BMD in 93 existing clinical CT scans of the lumbar spine of 34 subjects, scanned before and after administration of contrast agents.

View Article and Find Full Text PDF

Objective: The extent of resection (EOR) and postoperative residual tumor (RT) volume are prognostic factors in glioblastoma. Calculations of EOR and RT rely on accurate tumor segmentations. Raidionics is an open-access software that enables automatic segmentation of preoperative and early postoperative glioblastoma using pretrained deep learning models.

View Article and Find Full Text PDF

Purpose: Human epidermal growth factor receptor 2 (HER2)-targeted therapies have shown promise in treating -amplified metastatic colorectal cancer (mCRC). Identifying optimal biomarkers for treatment decisions remains challenging. This study explores the potential of artificial intelligence (AI) in predicting treatment responses to trastuzumab plus pertuzumab (TP) in patients with -amplified mCRC from the phase II TRIUMPH trial.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!