Purpose: This study aims to investigate the effects of language selection and translation quality on Generative Pre-trained Transformer-4 (GPT-4)'s response accuracy to expert-level diagnostic radiology questions.
Materials And Methods: We analyzed 146 diagnostic radiology questions from the Japan Radiology Board Examination (2020-2022), with consensus answers provided by two board-certified radiologists. The questions, originally in Japanese, were translated into English by GPT-4 and DeepL and into German and Chinese by GPT-4.
Purpose: Herein, we assessed the accuracy of large language models (LLMs) in generating responses to questions in clinical radiology practice. We compared the performance of ChatGPT, GPT-4, and Google Bard using questions from the Japan Radiology Board Examination (JRBE).
Materials And Methods: In total, 103 questions from the JRBE 2022 were used with permission from the Japan Radiological Society.