Unlabelled: The rapid progress in artificial intelligence, machine learning, and natural language processing has led to increasingly sophisticated large language models (LLMs) for use in healthcare. This study assesses the performance of two LLMs, the GPT-3.5 and GPT-4 models, in passing the MIR medical examination for access to medical specialist training in Spain. Our objectives included gauging the model's overall performance, analyzing discrepancies across different medical specialties, discerning between theoretical and practical questions, estimating error proportions, and assessing the hypothetical severity of errors committed by a physician.
Material And Methods: We studied the 2022 Spanish MIR examination results after excluding those questions requiring image evaluations or having acknowledged errors. The remaining 182 questions were presented to the LLM GPT-4 and GPT-3.5 in Spanish and English. Logistic regression models analyzed the relationships between question length, sequence, and performance. We also analyzed the 23 questions with images, using GPT-4's new image analysis capability.
Results: GPT-4 outperformed GPT-3.5, scoring 86.81% in Spanish ( < 0.001). English translations had a slightly enhanced performance. GPT-4 scored 26.1% of the questions with images in English. The results were worse when the questions were in Spanish, 13.0%, although the differences were not statistically significant ( = 0.250). Among medical specialties, GPT-4 achieved a 100% correct response rate in several areas, and the Pharmacology, Critical Care, and Infectious Diseases specialties showed lower performance. The error analysis revealed that while a 13.2% error rate existed, the gravest categories, such as "error requiring intervention to sustain life" and "error resulting in death", had a 0% rate.
Conclusions: GPT-4 performs robustly on the Spanish MIR examination, with varying capabilities to discriminate knowledge across specialties. While the model's high success rate is commendable, understanding the error severity is critical, especially when considering AI's potential role in real-world medical practice and its implications for patient safety.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10660543 | PMC |
http://dx.doi.org/10.3390/clinpract13060130 | DOI Listing |
Elife
January 2025
Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
Cigarette smoking is a well-known risk factor inducing the development and progression of various diseases. Nicotine (NIC) is the major constituent of cigarette smoke. However, knowledge of the mechanism underlying the NIC-regulated stem cell functions is limited.
View Article and Find Full Text PDFJMIR Form Res
January 2025
Minneapolis VA Health Care System, Minneapolis, MN, United States.
Background: The increasing use of ChatGPT in clinical practice and medical education necessitates the evaluation of its reliability, particularly in geriatrics.
Objective: This study aimed to evaluate ChatGPT's trustworthiness in geriatrics through 3 distinct approaches: evaluating ChatGPT's geriatrics attitude, knowledge, and clinical application with 2 vignettes of geriatric syndromes (polypharmacy and falls).
Methods: We used the validated University of California, Los Angeles, geriatrics attitude and knowledge instruments to evaluate ChatGPT's geriatrics attitude and knowledge and compare its performance with that of medical students, residents, and geriatrics fellows from reported results in the literature.
Background And Aims: Metabolic dysfunction-associated steatotic liver disease (MASLD) and its more severe subtype, metabolic dysfunction-associated steatohepatitis (MASH), are highly prevalent and strongly associated with obesity and type 2 diabetes (T2D). This study sought to identify challenges to the diagnosis, treatment and management of people living with MASLD and MASH and understand the key barriers to adopting relevant clinical guidelines.
Methods: A real-world, cross-sectional study (BARRIERS-MASLD) consisting of a quantitative survey and qualitative interviews of physicians in France, Germany, Italy, Spain and the United Kingdom was conducted from March to September 2023.
Pediatr Allergy Immunol Pulmonol
January 2025
Clinical Immunology Unit, Faculty of Medicine and Health Sciences, Department of Paediatrics, Universiti Putra Malaysia, Selangor, Malaysia.
: RAS guanyl-releasing protein 1 (RASGRP1) deficiency is characterized by immune dysregulation and Epstein-Barr virus (EBV)-related lymphoproliferation. Diffuse mesangial sclerosis is one of the infrequent causes of infantile nephrotic syndrome. : Here, we described a 7-year-old girl who was diagnosed with diffuse mesangial sclerosis at 5 months old and subsequently developed chronic bilateral neck swelling at the age of 3 years.
View Article and Find Full Text PDFCurr Opin Otolaryngol Head Neck Surg
December 2024
Purpose Of Review: To summarize current evidence regarding the indication of adjuvant treatment after transoral laser microsurgery (TOLMS).
Recent Findings: Apart from well known risk factors, margins represent the key point in the decision-making. If margins are affected, additional treatment is mandatory.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!