Purpose: Large language models (LLMs) have demonstrated a level of competency within the medical field. The aim of this study was to explore the ability of LLMs to predict the best neuroradiologic imaging modality given specific clinical presentations. In addition, the authors seek to determine if LLMs can outperform an experienced neuroradiologist in this regard.
Methods: ChatGPT and Glass AI, a health care-based LLM by Glass Health, were used. ChatGPT was prompted to rank the three best neuroimaging modalities while taking the best responses from Glass AI and the neuroradiologist. The responses were compared with the ACR Appropriateness Criteria for 147 conditions. Clinical scenarios were passed into each LLM twice to account for stochasticity. Each output was scored out of 3 on the basis of the criteria. Partial scores were given for nonspecific answers.
Results: ChatGPT and Glass AI scored 1.75 and 1.83, respectively, with no statistically significant difference. The neuroradiologist scored 2.20, significantly outperforming both LLMs. ChatGPT was also found to be the more inconsistent of the two LLMs, with the score difference between both outputs being statistically significant. Additionally, scores between different ranks output by ChatGPT were statistically significant.
Conclusions: LLMs perform well in selecting appropriate neuroradiologic imaging procedures when prompted with specific clinical scenarios. ChatGPT performed the same as Glass AI, suggesting that with medical text training, ChatGPT could significantly improve its function in this application. LLMs did not outperform an experienced neuroradiologist, indicating the need for continued improvement in the medical context.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1016/j.jacr.2023.06.008 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!