Background: Artificial intelligence advancements have enabled large language models to significantly impact radiology education and diagnostic accuracy.
Objective: This study evaluates the performance of mainstream large language models, including GPT-4, Claude, Bard, Tongyi Qianwen, and Gemini Pro, in radiology board exams.
Methods: A comparative analysis of 150 multiple-choice questions from radiology board exams without images was conducted. Models were assessed on their accuracy for text-based questions and were categorized by cognitive levels and medical specialties using χ2 tests and ANOVA.
Results: GPT-4 achieved the highest accuracy (83.3%, 125/150), significantly outperforming all other models. Specifically, Claude achieved an accuracy of 62% (93/150; P<.001), Bard 54.7% (82/150; P<.001), Tongyi Qianwen 70.7% (106/150; P=.009), and Gemini Pro 55.3% (83/150; P<.001). The odds ratios compared to GPT-4 were 0.33 (95% CI 0.18-0.60) for Claude, 0.24 (95% CI 0.13-0.44) for Bard, and 0.25 (95% CI 0.14-0.45) for Gemini Pro. Tongyi Qianwen performed relatively well with an accuracy of 70.7% (106/150; P=0.02) and had an odds ratio of 0.48 (95% CI 0.27-0.87) compared to GPT-4. Performance varied across question types and specialties, with GPT-4 excelling in both lower-order and higher-order questions, while Claude and Bard struggled with complex diagnostic questions.
Conclusions: GPT-4 and Tongyi Qianwen show promise in medical education and training. The study emphasizes the need for domain-specific training datasets to enhance large language models' effectiveness in specialized fields like radiology.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11756834 | PMC |
http://dx.doi.org/10.2196/64284 | DOI Listing |
J Med Internet Res
January 2025
Department of Engineering Management and Systems Engineering, George Washington University, Washington, DC, United States.
Background: Large language model (LLM) artificial intelligence chatbots using generative language can offer smoking cessation information and advice. However, little is known about the reliability of the information provided to users.
Objective: This study aims to examine whether 3 ChatGPT chatbots-the World Health Organization's Sarah, BeFreeGPT, and BasicGPT-provide reliable information on how to quit smoking.
JMIR Med Educ
January 2025
College of Medicine, Alfaisal University, Takhasussi street, Riyadh, 11533, Saudi Arabia, 966 559441589.
Background: There has been a rise in the popularity of ChatGPT and other chat-based artificial intelligence (AI) apps in medical education. Despite data being available from other parts of the world, there is a significant lack of information on this topic in medical education and research, particularly in Saudi Arabia.
Objective: The primary objective of the study was to examine the familiarity, usage patterns, and attitudes of Alfaisal University medical students toward ChatGPT and other chat-based AI apps in medical education.
In 2021, a year before ChatGPT took the world by storm amid the excitement about generative artificial intelligence (AI), AlphaFold 2 cracked the 50-year-old protein-folding problem, predicting three-dimensional (3D) structures for more than 200 million proteins from their amino acid sequences. This accomplishment was a precursor to an unprecedented burgeoning of large language models (LLMs) in the life sciences. That was just the beginning.
View Article and Find Full Text PDFPLoS One
January 2025
Department of Information Technologies, Faculty of Economics and Management, Czech University of Life Sciences Prague, Prague, Czech Republic.
Social networks are a battlefield for political propaganda. Protected by the anonymity of the internet, political actors use computational propaganda to influence the masses. Their methods include the use of synchronized or individual bots, multiple accounts operated by one social media management tool, or different manipulations of search engines and social network algorithms, all aiming to promote their ideology.
View Article and Find Full Text PDFPLoS One
January 2025
School of Industrial and Management Engineering, Korea University, Seongbuk-gu, Seoul, Republic of Korea.
A medical specialty prediction system for remote diagnosis can reduce the unexpected costs incurred by first-visit patients who visit the wrong hospital department for their symptoms. To develop medical specialty prediction systems, several researchers have explored clinical predictive models using real medical text data. Medical text data include large amounts of information regarding patients, which increases the sequence length.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!