Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

Heliyon

Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan.

Published: July 2024

Background: Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.

Methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.

Results: Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.

Conclusion: As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324965PMC
http://dx.doi.org/10.1016/j.heliyon.2024.e34851DOI Listing

Publication Analysis

Top Keywords

plastic surgery
20
chatgpt-35 chatgpt-4
16
taiwan plastic
16
surgery board
16
performance chatgpt-35
12
board examination
12
chatgpt-35
7
chatgpt-4
7
plastic
5
surgery
5

Similar Publications

Objective: The natural history of cephaloceles is not well understood. The goal of this study was to better understand the natural history of fetal cephaloceles from prenatal diagnosis to the postnatal period.

Methods: Between January 2013 and April 2023, all patients evaluated with a cephalocele at the Center for Fetal Diagnosis and Treatment were identified.

View Article and Find Full Text PDF

Median Craniofacial Hypoplasia.

J Craniofac Surg

January 2025

Division of Plastic & Reconstructive Surgery, John H. Stroger Hospital of Cook County, Chicago, IL.

Median craniofacial hypoplasia is characterized by tissue deficiency of the midline facial structures and/or brain. Patients can present with a wide variety of facial differences that may or may not require operative intervention. Common reconstructive procedures include cleft lip and/or palate repair, rhinoplasty, and orthognathic surgery, among others.

View Article and Find Full Text PDF

Thread-Filler: A Standardized Combination Therapy.

J Craniofac Surg

January 2025

Department of Plastic, Reconstructive, and Aesthetic Surgery, Bilkay Clinic, Izmir, Turkey.

Advanced technology and increasing knowledge about aging faces have combined to create the illusion of thread lifting to replace surgical interventions. However, results that came far beyond expectations led to a heavy suspicion of these tools. However, combined treatments with fillers would have better outcomes with a synergetic effect.

View Article and Find Full Text PDF

Background: Financial toxicity is the detrimental impact of health care costs that must be mitigated to achieve universal health coverage. Catastrophic health expenditure (CHE) is widely used to measure financial toxicity but does not capture patient perspectives of unaffordable health care costs. Financial hardship (FH), a patient-reported outcome measure, is currently underutilized but may be an important adjunct metric.

View Article and Find Full Text PDF

Facial nerve dysfunction (FND) is a well-recognized but poorly documented complication of mandibular distraction osteogenesis (MDO) for Robin sequence (RS). This study aims to document the authors' experiences with FND and identify risk factors associated with this adverse event. A retrospective review of a prospectively gathered database was performed to identify patients with RS who underwent MDO at the authors' institution from March 2016 to June 2023.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!