Recently developed chatbots based on large language models (further called bots) have promising features which could facilitate medical education. Several bots are freely available, but their proficiency has been insufficiently evaluated. In this study the authors have tested the current performance on the multiple-choice medical licensing exam of University of Antwerp (Belgium) of six widely used bots: ChatGPT (OpenAI), Bard (Google), New Bing (Microsoft), Claude instant (Anthropic), Claude+ (Anthropic) and GPT-4 (OpenAI). The primary outcome was the performance on the exam expressed as a proportion of correct answers. Secondary analyses were done for a variety of features in the exam questions: easy versus difficult questions, grammatically positive versus negative questions, and clinical vignettes versus theoretical questions. Reasoning errors and untruthful statements (hallucinations) in the bots' answers were examined. All bots passed the exam; Bing and GPT-4 (both 76% correct answers) outperformed the other bots (62-67%, p = 0.03) and students (61%). Bots performed worse on difficult questions (62%, p = 0.06), but outperformed students (32%) on those questions even more (p<0.01). Hallucinations were found in 7% of Bing's and GPT4's answers, significantly lower than Bard (22%, p<0.01) and Claude Instant (19%, p = 0.02). Although the creators of all bots try to some extent to avoid their bots being used as a medical doctor, none of the tested bots succeeded as none refused to answer all clinical case questions.Bing was able to detect weak or ambiguous exam questions. Bots could be used as a time efficient tool to improve the quality of a multiple-choice exam.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10866461 | PMC |
http://dx.doi.org/10.1371/journal.pdig.0000349 | DOI Listing |
Indian J Crit Care Med
December 2024
Department of Medical-Surgical Nursing, Faculty of Nursing, Ain Shams University, Cairo, Egypt.
Background: This study aims to assess the knowledge of Palestinian critical care nurses regarding the prevention of ventilator-associated pneumonia (VAP), an acquired infection that affects critically ill patients on ventilators in hospitals. Nurses caring for these patients may not always be aware of the most effective methods to prevent VAP.
Materials And Methods: A descriptive cross-sectional study was conducted in five government hospitals in Gaza Strip, Palestine over 3 months.
Cureus
December 2024
Dermatology, All India Institute of Medical Sciences, Nagpur, Nagpur, IND.
Introduction: Leprosy is a common infectious disease in India that can lead to nerve damage and disability. There is a dearth of knowledge regarding leprosy not only among the general public but also among healthcare workers. This knowledge gap leads to the generation of stigma and delay in the detection of new cases.
View Article and Find Full Text PDFBMC Med Educ
January 2025
Centre for Disaster Medicine, University of Gothenburg, Gothenburg, Sweden.
Background: Chemical, biological and nerve gas events have a significant impact on public health, necessitating proper education and training. This study investigated the educational needs as perceived by two groups, frontline healthcare workers and medical students, in relation to chemical, biological, and nerve gas events.
Methods: Three distinct web-based cross-sectional surveys were conducted, one each for chemical, biological, and nerve gas events, with each survey following the same structural format including sections on (a) theoretical knowledge assessment, using multiple-choice questions regarding identification, protection, and treatment, (b) perception of threat, using questions based on a 5-point Likert scale to gauge views on threat/preparedness and (c) perception of existing competency, with questions regarding prior education and the need for additional education and training.
J Med Internet Res
December 2024
Guangzhou Cadre and Talent Health Management Center, Guangzhou, China.
Background: Large language models have shown remarkable efficacy in various medical research and clinical applications. However, their skills in medical image recognition and subsequent report generation or question answering (QA) remain limited.
Objective: We aim to finetune a multimodal, transformer-based model for generating medical reports from slit lamp images and develop a QA system using Llama2.
Morphologie
January 2025
Laboratório de Anatomia Humana, Instituto de Educação Física e Esportes, Universidade Federal do Ceará, Fortaleza, Brazil; Programa de Pós-Graduação em Ciências Morfofuncionais, Departamento de Morfologia, Faculdade de Medicina, Universidade Federal do Ceará, Fortaleza, Brazil. Electronic address:
Background: Gross human anatomy is essential in undergraduate programs across biological and health sciences. While extensive literature explores medical students' knowledge in this area, studies on non-medical students, particularly those in physical education, are scarce.
Objective: This study assessed the anatomy knowledge among Brazilian physical education students and explored differences based on employment status, type of class instruction (face-to-face vs.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!