​Purpose The purpose of this study was to evaluate the capabilities of large language models (LLMs) in understanding radiation safety and protection. We assessed the performance of generative pe-trained transformer (GPT)-4 (OpenAI, USA) and Gemini Advanced (Google DeepMind, London) using questions from the First-Class Radiation Protection Supervisor Examination in Japan. Methods The study involved GPT-4 and Gemini Advanced answering questions from the 68th First-Class Radiation Protection Supervisor Examination in Japan. The number of correct and incorrect answers based on the subject, the presence or absence of calculation, the passage length, and the format (textual or graphical questions) were analyzed in this study. Comparisons of the results between GPT-4 and Gemini Advanced were performed. Results The overall accuracy rates of GPT-4 and Gemini Advanced were 71.0% and 65.3%, respectively. A significant difference was observed in the subject (P < 0.0001 in GPT-4 and P = 0.0127 in Gemini Advanced). The accuracy rate of laws and regulations was lower than in the other subjects. There was no significant difference in the presence or absence of calculation or the passage length. The performance of both LLMs was significantly better in textual questions than in graphical questions (P = 0.0003 in GPT-4 and P < 0.0001 in Gemini Advanced). The performance of the two LLMs did not differ significantly based on the subject, the presence or absence of calculation, the passage length, or the format. Conclusions GPT-4 and Gemini Advanced demonstrated sufficient understanding of physics, chemistry, biology, and practical operations to meet the passing standard for the average score. However, in laws and regulations, their performance was insufficient, possibly due to frequent revisions and the complexity of detailed regulations, and further machine learning is required.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11526626PMC
http://dx.doi.org/10.7759/cureus.70614DOI Listing

Publication Analysis

Top Keywords

gemini advanced
32
gpt-4 gemini
20
first-class radiation
12
radiation protection
12
protection supervisor
12
supervisor examination
12
examination japan
12
presence absence
12
absence calculation
12
calculation passage
12

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!