Performance of ChatGPT and Bard on the medical licensing examinations varies across different cultures: a comparison study.

Yikai Chen Xiujie Huang Fangjie Yang Haiming Lin Haoyu Lin Zhuoqun Zheng Qifeng Liang Jinhai Zhang Xinxin Li

BMC Med Educ

Department of Gastrointestinal Surgery, The First Affiliated Hospital of Shantou University Medical College, No. 57 Changping Road, Jinping District, Shantou, Guangdong, 515000, China.

Published: November 2024

Background: This study aimed to evaluate the performance of GPT-3.5, GPT-4, GPT-4o and Google Bard on the United States Medical Licensing Examination (USMLE), the Professional and Linguistic Assessments Board (PLAB), the Hong Kong Medical Licensing Examination (HKMLE) and the National Medical Licensing Examination (NMLE).

Methods: This study was conducted in June 2023. Four LLMs (Large Language Models) (GPT-3.5, GPT-4, GPT-4o and Google Bard) were applied to four medical standardized tests (USMLE, PLAB, HKMLE and NMLE). All questions are multiple-choice questions and were sourced from the question banks of these examinations.

Results: In USMLE step 1, step 2CK and Step 3, there are accuracy rates of 91.5%, 94.2% and 92.7% provided from GPT-4o, 93.2%, 95.0% and 92.0% provided from GPT-4, 65.6%, 71.6% and 68.5% provided from GPT-3.5, and 64.3%, 55.6%, 58.1% from Google Bard, respectively. In PLAB, HKMLE and NMLE, GPT-4o scored 93.3%, 91.7% and 84.9%, GPT-4 scored 86.7%, 89.6% and 69.8%, GPT-3.5 scored 80.0%, 68.1% and 60.4%, and Google Bard scored 54.2%, 71.7% and 61.3%. There was significant difference in the accuracy rates of four LLMs in the four medical licensing examinations.

Conclusion: GPT-4o performed better in the medical licensing examinations than other three LLMs. The performance of the four models in the NMLE examination needs further improvement.

Clinical Trial Number: Not applicable.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590336	PMC
http://dx.doi.org/10.1186/s12909-024-06309-x	DOI Listing

Publication Analysis

Top Keywords

medical licensing

google bard

licensing examination

licensing examinations

gpt-35 gpt-4

gpt-4 gpt-4o

gpt-4o google

plab hkmle

hkmle nmle

accuracy rates

Similar Publications

Qwen-2.5 Outperforms Other Large Language Models in the Chinese National Nursing Licensing Examination: Retrospective Cross-Sectional Comparative Study.

JMIR Med Inform

January 2025

Department of Science and Education, Shenzhen Baoan Women's and Children's Hospital, Shenzhen, China.

Shiben Zhu Wanqin Hu Zhi Yang Jiani Yan Fang Zhang

Background: Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.

View Article and Find Full Text PDF

Similar Publications

Similar but Different Three Major Traditional Medicines in East Asia: A Bibliometric Analysis.

Chin J Integr Med

January 2025

Department of Oriental Neuropsychiatry, Dong-Eui University College of Korean Medicine, Busan, Republic of Korea.

Chan-Young Kwon

Objective: Traditional medicine (TM) has played a key role in the health care system of East Asian countries, including China, Japan and South Korea. This bibliometric study analyzes the recent research status of these three TMs, including traditional Chinese medicine (TCM), traditional Korean medicine (TKM), and Kampo medicine (KM).

Methods: Research topics of studies published for recent 10 years (2014 to 2023), through a search on MEDLINE via PubMed, was analyzed.

View Article and Find Full Text PDF

Similar Publications

Repurposing Licensed Drugs with Activity Against Epstein-Barr Virus for Treatment of Multiple Sclerosis: A Systematic Approach.

CNS Drugs

January 2025

School of Medicine and Dentistry, Gold Coast Campus, Griffith University, Southport, QLD, 4222, Australia.

Vivien Li Fiona C McKay David C Tscharke Corey Smith Rajiv Khanna

Background: Epstein-Barr virus (EBV) is implicated as a necessary factor in the development of multiple sclerosis (MS) and may also be a driver of disease activity. Although it is not clear whether ongoing viral replication is the driver for MS pathology, MS researchers have considered the prospect of using drugs with potential efficacy against EBV in the treatment of MS. We have undertaken scientific and lived experience expert panel reviews to shortlist existing licensed therapies that could be used in later-stage clinical trials in MS.

View Article and Find Full Text PDF

Similar Publications

Epidemiology of cutaneous sarcoidosis in an electronic health database: a cross-sectional analysis.

Arch Dermatol Res

January 2025

Department of Dermatology, Brigham and Women's Hospital, 221 Longwood Avenue, Boston, MA, 02115, USA.

Ahana Gaurav Eric Xia David Stein Megan H Noe Arash Mostaghimi

View Article and Find Full Text PDF

Similar Publications

Procedural success prediction in chronic total occlusion percutaneous coronary intervention (CTO-PCI)-the rise of the machines?

Cardiovasc Diagn Ther

December 2024

Department of Cardiovascular, University Hospital Basel, Basel, Switzerland.

Claudiu Ungureanu Gregor Leibundgut

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!