Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

Ching-Hua Hsieh Hsiao-Yun Hsieh Hui-Ping Lin

Heliyon

Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan.

Published: July 2024

Background: Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.

Methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.

Results: Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.

Conclusion: As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11324965	PMC
http://dx.doi.org/10.1016/j.heliyon.2024.e34851	DOI Listing

Publication Analysis

Top Keywords

plastic surgery

chatgpt-35 chatgpt-4

taiwan plastic

surgery board

performance chatgpt-35

board examination

chatgpt-35

chatgpt-4

plastic

surgery

Similar Publications

In utero progression of cephaloceles: prenatal to postnatal analysis.

J Neurosurg Pediatr

January 2025

1Division of Neurosurgery, Department of Surgery, Children's Hospital of Philadelphia.

Maria A Punchak Sanjana R Salwi Sierra D Land Sarah Hamimi Tom A Reynolds

Objective: The natural history of cephaloceles is not well understood. The goal of this study was to better understand the natural history of fetal cephaloceles from prenatal diagnosis to the postnatal period.

Methods: Between January 2013 and April 2023, all patients evaluated with a cephalocele at the Center for Fetal Diagnosis and Treatment were identified.

View Article and Find Full Text PDF

Similar Publications

Median Craniofacial Hypoplasia.

J Craniofac Surg

January 2025

Division of Plastic & Reconstructive Surgery, John H. Stroger Hospital of Cook County, Chicago, IL.

Brandon Alba Kelly A Harmon Okensama La-Anyane Christina Tragos

Median craniofacial hypoplasia is characterized by tissue deficiency of the midline facial structures and/or brain. Patients can present with a wide variety of facial differences that may or may not require operative intervention. Common reconstructive procedures include cleft lip and/or palate repair, rhinoplasty, and orthognathic surgery, among others.

View Article and Find Full Text PDF

Similar Publications

Thread-Filler: A Standardized Combination Therapy.

J Craniofac Surg

January 2025

Department of Plastic, Reconstructive, and Aesthetic Surgery, Bilkay Clinic, Izmir, Turkey.

Özge Öztürk Bilkay Mehmet Emre Yeğin Ufuk Bilkay

Advanced technology and increasing knowledge about aging faces have combined to create the illusion of thread lifting to replace surgical interventions. However, results that came far beyond expectations led to a heavy suspicion of these tools. However, combined treatments with fillers would have better outcomes with a synergetic effect.

View Article and Find Full Text PDF

Similar Publications

Use of Financial Hardship as a Metric for Assessing Financial Toxicity in Surgical Trauma Patients.

J Craniofac Surg

January 2025

Brigham & Women's Hospital, Boston, MA.

Anam N Ehsan Shivangi Saha Preet Hathi Srinivasan Vengadassalapathy Hamaiyal Sana

Background: Financial toxicity is the detrimental impact of health care costs that must be mitigated to achieve universal health coverage. Catastrophic health expenditure (CHE) is widely used to measure financial toxicity but does not capture patient perspectives of unaffordable health care costs. Financial hardship (FH), a patient-reported outcome measure, is currently underutilized but may be an important adjunct metric.

View Article and Find Full Text PDF

Similar Publications

Focused Investigation of Facial Nerve Dysfunction After Mandibular Distraction Osteogenesis for Robin Sequence.

J Craniofac Surg

January 2025

Division of Plastic and Reconstructive Surgery, Children's National Hospital.

Esperanza Mantilla-Rivas Sofia Finestone Hannah R Crowder Joseph M Escandon Md Sohel Rana

Facial nerve dysfunction (FND) is a well-recognized but poorly documented complication of mandibular distraction osteogenesis (MDO) for Robin sequence (RS). This study aims to document the authors' experiences with FND and identify risk factors associated with this adverse event. A retrospective review of a prospectively gathered database was performed to identify patients with RS who underwent MDO at the authors' institution from March 2016 to June 2023.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!