Rationale And Objectives: In our study, we evaluate GPT-4's performance on the American College of Radiology (ACR) 2022 Diagnostic Radiology In-Training Examination (DXIT). We perform multiple experiments across time points to assess for model drift, as well as after fine-tuning to assess for differences in accuracy.
Materials And Methods: Questions were sequentially input into GPT-4 with a standardized prompt. Each answer was recorded and overall accuracy was calculated, as was logic-adjusted accuracy, and accuracy on image-based questions. This experiment was repeated several months later to assess for model drift, then again after the performance of fine-tuning to assess for changes in GPT's performance.
Results: GPT-4 achieved 58.5% overall accuracy, lower than the PGY-3 average (61.9%) but higher than the PGY-2 average (52.8%). Adjusted accuracy was 52.8%. GPT-4 showed significantly higher (p = 0.012) confidence for correct answers (87.1%) compared to incorrect (84.0%). Performance on image-based questions was significantly poorer (p < 0.001) at 45.4% compared to text-only questions (80.0%), with adjusted accuracy for image-based questions of 36.4%. When the questions were repeated, GPT-4 chose a different answer 25.5% of the time and there was no change in accuracy. Fine-tuning did not improve accuracy.
Conclusion: GPT-4 performed between PGY-2 and PGY-3 levels on the 2022 DXIT, significantly poorer on image-based questions, and with large variability in answer choices across time points. Exploratory experiments in fine-tuning did not improve performance. This study underscores the potential and risks of using minimally-prompted general AI models in interpreting radiologic images as a diagnostic tool. Implementers of general AI radiology systems should exercise caution given the possibility of spurious yet confident responses.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.acra.2024.04.006 | DOI Listing |
Evol Appl
January 2025
Instituto Gulbenkian de Ciência Oeiras Portugal.
Most methods currently used to infer the "demographic history of species" interpret this expression as a history of population size changes. The detection, quantification, and dating of demographic changes often rely on the assumption that population structure can be neglected. However, most vertebrates are typically organized in populations subdivided into social groups that are usually ignored in the interpretation of genetic data.
View Article and Find Full Text PDFSci Rep
January 2025
Department of Otolaryngology - Head and Neck Surgery, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, 08901, USA.
Loud noise exposure is one of the leading causes of permanent hearing loss. Individuals with noise-induced hearing loss (NIHL) suffer from speech comprehension deficits and experience impairments to cognitive functions such as attention and decision-making. Here, we investigate the specific underlying cognitive processes during auditory perceptual decision-making that are impacted by NIHL.
View Article and Find Full Text PDFChaos
January 2025
Complex Systems Group, Department of Mathematics and Statistics, The University of Western Australia, Crawley, Western Australia 6009, Australia.
We propose a universal method based on deep reinforcement learning (specifically, soft actor-critic) to control the chimera state in the coupled oscillators. The policy for control is learned by maximizing the expectation of the cumulative reward in the reinforcement learning framework. With the aid of the local order parameter, we design a class of reward functions for controlling the chimera state, specifically confining the spatial position of coherent and incoherent domains to any desired lateral position of oscillators.
View Article and Find Full Text PDFElife
January 2025
Department of Neurology, University of Iowa, Iowa City, United States.
The role of striatal pathways in cognitive processing is unclear. We studied dorsomedial striatal cognitive processing during interval timing, an elementary cognitive task that requires mice to estimate intervals of several seconds and involves working memory for temporal rules as well as attention to the passage of time. We harnessed optogenetic tagging to record from striatal D2-dopamine receptor-expressing medium spiny neurons (D2-MSNs) in the indirect pathway and from D1-dopamine receptor-expressing MSNs (D1-MSNs) in the direct pathway.
View Article and Find Full Text PDFChemphyschem
January 2025
Fachbereich Chemie, Philipps-Universität Marburg, 35032, Marburg, Germany.
Both, molecular chemical reactions and transport of atoms in solid media are determined by the energy landscape in which the seemingly different processes take place. Chemical reactions can be described as cooperative translocation of two chemical entities on a common potential energy surface. Transport of atoms in a solid can be envisaged as the translocation of a single particle in the potential energy landscape of all other particles constituting the solid.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!