Background: This study aimed to evaluate the performance of GPT-3.5, GPT-4, GPT-4o and Google Bard on the United States Medical Licensing Examination (USMLE), the Professional and Linguistic Assessments Board (PLAB), the Hong Kong Medical Licensing Examination (HKMLE) and the National Medical Licensing Examination (NMLE).

Methods: This study was conducted in June 2023. Four LLMs (Large Language Models) (GPT-3.5, GPT-4, GPT-4o and Google Bard) were applied to four medical standardized tests (USMLE, PLAB, HKMLE and NMLE). All questions are multiple-choice questions and were sourced from the question banks of these examinations.

Results: In USMLE step 1, step 2CK and Step 3, there are accuracy rates of 91.5%, 94.2% and 92.7% provided from GPT-4o, 93.2%, 95.0% and 92.0% provided from GPT-4, 65.6%, 71.6% and 68.5% provided from GPT-3.5, and 64.3%, 55.6%, 58.1% from Google Bard, respectively. In PLAB, HKMLE and NMLE, GPT-4o scored 93.3%, 91.7% and 84.9%, GPT-4 scored 86.7%, 89.6% and 69.8%, GPT-3.5 scored 80.0%, 68.1% and 60.4%, and Google Bard scored 54.2%, 71.7% and 61.3%. There was significant difference in the accuracy rates of four LLMs in the four medical licensing examinations.

Conclusion: GPT-4o performed better in the medical licensing examinations than other three LLMs. The performance of the four models in the NMLE examination needs further improvement.

Clinical Trial Number: Not applicable.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11590336PMC
http://dx.doi.org/10.1186/s12909-024-06309-xDOI Listing

Publication Analysis

Top Keywords

medical licensing
24
google bard
16
licensing examination
12
licensing examinations
8
gpt-35 gpt-4
8
gpt-4 gpt-4o
8
gpt-4o google
8
plab hkmle
8
hkmle nmle
8
accuracy rates
8

Similar Publications

Background: Large language models (LLMs) have been proposed as valuable tools in medical education and practice. The Chinese National Nursing Licensing Examination (CNNLE) presents unique challenges for LLMs due to its requirement for both deep domain-specific nursing knowledge and the ability to make complex clinical decisions, which differentiates it from more general medical examinations. However, their potential application in the CNNLE remains unexplored.

View Article and Find Full Text PDF

Objective: Traditional medicine (TM) has played a key role in the health care system of East Asian countries, including China, Japan and South Korea. This bibliometric study analyzes the recent research status of these three TMs, including traditional Chinese medicine (TCM), traditional Korean medicine (TKM), and Kampo medicine (KM).

Methods: Research topics of studies published for recent 10 years (2014 to 2023), through a search on MEDLINE via PubMed, was analyzed.

View Article and Find Full Text PDF

Background: Epstein-Barr virus (EBV) is implicated as a necessary factor in the development of multiple sclerosis (MS) and may also be a driver of disease activity. Although it is not clear whether ongoing viral replication is the driver for MS pathology, MS researchers have considered the prospect of using drugs with potential efficacy against EBV in the treatment of MS. We have undertaken scientific and lived experience expert panel reviews to shortlist existing licensed therapies that could be used in later-stage clinical trials in MS.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!