ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study.

Gynecol Oncol

Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA. Electronic address:

Published: October 2024

AI Article Synopsis

  • The study compared the performance of the chatbot ChatGPT (GPT-4) to the National Comprehensive Cancer Network (NCCN) Guidelines in managing ovarian cancer by evaluating 10 key questions.
  • ChatGPT, both unprompted and prompted, was found to have a higher accuracy and completeness in its responses compared to NCCN for sections like risk factors and surgical management, but it lagged in medical management.
  • Despite GPT-4's promising performance, the presence of inaccuracies indicates that unsupervised use of chatbots for medical guidance should be approached with caution.

Article Abstract

Objectives: We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer.

Methods: Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers.

Results: Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate.

Conclusions: GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11402584PMC
http://dx.doi.org/10.1016/j.ygyno.2024.07.007DOI Listing

Publication Analysis

Top Keywords

unprompted gpt
16
prompted gpt
16
accurate complete
16
management ovarian
12
ovarian cancer
12
risk factors
12
factors surgical
12
surgical management
12
management surveillance
12
compared national
8

Similar Publications

This pilot study investigated the use of Generative AI using ChatGPT to produce Boolean search strings to query PubMed. The goals were to determine if ChatGPT could be used in search string formation and if so, which approach was most effective. Research outputs from published systematic reviews were compared to outputs from AI generated search strings.

View Article and Find Full Text PDF
Article Synopsis
  • Large language model (LLM) chatbots, like ChatGPT and Bard, can provide information about benign prostatic hyperplasia surgery, with their response quality improving significantly when prompted with specific criteria.
  • A study evaluated the information quality and readability of these chatbots, finding that unprompted answers were rated moderately, but prompting improved the quality substantially.
  • Overall, while the chatbots were generally accurate and complete in responding to simulated patient queries, the readability of their responses was poor, indicating a need for improvement in patient education materials.*
View Article and Find Full Text PDF

ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? - A Memorial Sloan Kettering Cancer Center Team Ovary study.

Gynecol Oncol

October 2024

Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA. Electronic address:

Article Synopsis
  • The study compared the performance of the chatbot ChatGPT (GPT-4) to the National Comprehensive Cancer Network (NCCN) Guidelines in managing ovarian cancer by evaluating 10 key questions.
  • ChatGPT, both unprompted and prompted, was found to have a higher accuracy and completeness in its responses compared to NCCN for sections like risk factors and surgical management, but it lagged in medical management.
  • Despite GPT-4's promising performance, the presence of inaccuracies indicates that unsupervised use of chatbots for medical guidance should be approached with caution.
View Article and Find Full Text PDF

Enhancing Patient Communication With Chat-GPT in Radiology: Evaluating the Efficacy and Readability of Answers to Common Imaging-Related Questions.

J Am Coll Radiol

February 2024

Associate Professor, Department of Radiology, Section Chief, Abdominal Imaging, and Medical Director, Radiology Practice and Operational Excellence, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania.

Purpose: To assess ChatGPT's accuracy, relevance, and readability in answering patients' common imaging-related questions and examine the effect of a simple prompt.

Methods: A total of 22 imaging-related questions were developed from categories previously described as important to patients, as follows: safety, the radiology report, the procedure, preparation before imaging, meaning of terms, and medical staff. These questions were posed to ChatGPT with and without a short prompt instructing the model to provide an accurate and easy-to-understand response for the average person.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!