Severity: Warning
Message: file_get_contents(https://...@pubfacts.com&api_key=b8daa3ad693db53b1410957c26c9a51b4908&a=1): Failed to open stream: HTTP request failed! HTTP/1.1 429 Too Many Requests
Filename: helpers/my_audit_helper.php
Line Number: 176
Backtrace:
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 176
Function: file_get_contents
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 250
Function: simplexml_load_file_from_url
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 1034
Function: getPubMedXML
File: /var/www/html/application/helpers/my_audit_helper.php
Line: 3152
Function: GetPubMedArticleOutput_2016
File: /var/www/html/application/controllers/Detail.php
Line: 575
Function: pubMedSearch_Global
File: /var/www/html/application/controllers/Detail.php
Line: 489
Function: pubMedGetRelatedKeyword
File: /var/www/html/index.php
Line: 316
Function: require_once
Purpose: We aimed to assess the appropriateness of ChatGPT in providing answers related to prostate cancer (PCa) screening, comparing GPT-3.5 and GPT-4.
Methods: A committee of five reviewers designed 30 questions related to PCa screening, categorized into three difficulty levels. The questions were formulated identically for both GPTs three times, varying the prompts. Each reviewer assigned a score for accuracy, clarity, and conciseness. The readability was assessed by the Flesch Kincaid Grade (FKG) and Flesch Reading Ease (FRE). The mean scores were extracted and compared using the Wilcoxon test. We compared the readability across the three different prompts by ANOVA.
Results: In GPT-3.5 the mean score (SD) for accuracy, clarity, and conciseness was 1.5 (0.59), 1.7 (0.45), 1.7 (0.49), respectively for easy questions; 1.3 (0.67), 1.6 (0.69), 1.3 (0.65) for medium; 1.3 (0.62), 1.6 (0.56), 1.4 (0.56) for hard. In GPT-4 was 2.0 (0), 2.0 (0), 2.0 (0.14), respectively for easy questions; 1.7 (0.66), 1.8 (0.61), 1.7 (0.64) for medium; 2.0 (0.24), 1.8 (0.37), 1.9 (0.27) for hard. GPT-4 performed better for all three qualities and difficulty levels than GPT-3.5. The FKG mean for GPT-3.5 and GPT-4 answers were 12.8 (1.75) and 10.8 (1.72), respectively; the FRE for GPT-3.5 and GPT-4 was 37.3 (9.65) and 47.6 (9.88), respectively. The 2nd prompt has achieved better results in terms of clarity (all p < 0.05).
Conclusions: GPT-4 displayed superior accuracy, clarity, conciseness, and readability than GPT-3.5. Though prompts influenced the quality response in both GPTs, their impact was significant only for clarity.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1007/s11255-024-04009-5 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!