A PHP Error was encountered

Severity: Warning

Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests

Filename: helpers/my_audit_helper.php

Line Number: 176

Backtrace:

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url

File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3122
Function: getPubMedXML

File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global

File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword

File: /var/www/html/index.php
Line: 316
Function: require_once

Efficacy of large language models and their potential in Obstetrics and Gynecology education. | LitMetric

Efficacy of large language models and their potential in Obstetrics and Gynecology education.

Obstet Gynecol Sci

Department of Obstetrics and Gynecology, Institute of Women's Medical Life Science, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea.

Published: November 2024

Objective: The performance of large language models (LLMs) and their potential utility in obstetric and gynecological education are topics of ongoing debate. This study aimed to contribute to this discussion by examining the recent advancements in LLM technology and their transformative potential in artificial intelligence.

Methods: This study assessed the performance of generative pre-trained transformer (GPT)-3.5 and -4 in understanding clinical information, as well as its potential implications for obstetric and gynecological education. Obstetrics and gynecology residents at three hospitals underwent an annual promotional examination, from which 116 of the 170 questions over 4 years (2020-2023) were analyzed, excluding 54 questions with images. The scores achieved by GPT-3.5, -4, and the 100 residents were compared.

Results: The average scores across all 4 years for GPT-3.5 and -4 were 38.79 (standard deviation [SD], 5.65) and 79.31 (SD, 3.67), respectively. For groups first-year resident, second-year resident, and third-year resident, the cumulative annual average scores were 79.12 (SD, 9.00), 80.95 (SD, 5.86), and 83.60 (SD, 6.82), respectively. No statistically significant differences were observed between the scores of GPT-4.0 and those of the residents. When analyzing questions specific to obstetrics, the average scores for GPT-3.5 and -4.0 were 33.44 (SD, 10.18) and 90.22 (SD, 7.68), respectively.

Conclusion: GPT-4 demonstrated exceptional performance in obstetrics, different types of data interpretation, and problem solving, showcasing the potential utility of LLMs in these areas. However, acknowledging the constraints of LLMs is crucial and their utilization should augment human expertise and discernment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581811PMC
http://dx.doi.org/10.5468/ogs.24211DOI Listing

Publication Analysis

Top Keywords

average scores
12
large language
8
language models
8
obstetrics gynecology
8
potential utility
8
obstetric gynecological
8
gynecological education
8
potential
5
scores
5
efficacy large
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!