Objectives: We evaluated the performance of a chatbot compared to the National Comprehensive Cancer Network (NCCN) Guidelines for the management of ovarian cancer.
Methods: Using NCCN Guidelines, we generated 10 questions and answers regarding management of ovarian cancer at a single point in time. Questions were thematically divided into risk factors, surgical management, medical management, and surveillance. We asked ChatGPT (GPT-4) to provide responses without prompting (unprompted GPT) and with prompt engineering (prompted GPT). Responses were blinded and evaluated for accuracy and completeness by 5 gynecologic oncologists. A score of 0 was defined as inaccurate, 1 as accurate and incomplete, and 2 as accurate and complete. Evaluations were compared among NCCN, unprompted GPT, and prompted GPT answers.
Results: Overall, 48% of responses from NCCN, 64% from unprompted GPT, and 66% from prompted GPT were accurate and complete. The percentage of accurate but incomplete responses was higher for NCCN vs GPT-4. The percentage of accurate and complete scores for questions regarding risk factors, surgical management, and surveillance was higher for GPT-4 vs NCCN; however, for questions regarding medical management, the percentage was lower for GPT-4 vs NCCN. Overall, 14% of responses from unprompted GPT, 12% from prompted GPT, and 10% from NCCN were inaccurate.
Conclusions: GPT-4 provided accurate and complete responses at a single point in time to a limited set of questions regarding ovarian cancer, with best performance in areas of risk factors, surgical management, and surveillance. Occasional inaccuracies, however, should limit unsupervised use of chatbots at this time.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11402584 | PMC |
http://dx.doi.org/10.1016/j.ygyno.2024.07.007 | DOI Listing |
Med Ref Serv Q
December 2024
Penn State College of Medicine, University Park Program, State College, PA, USA.
This pilot study investigated the use of Generative AI using ChatGPT to produce Boolean search strings to query PubMed. The goals were to determine if ChatGPT could be used in search string formation and if so, which approach was most effective. Research outputs from published systematic reviews were compared to outputs from AI generated search strings.
View Article and Find Full Text PDFProstate
February 2025
Department of Urology, Mayo Clinic Arizona Department of Urology, Phoenix, Arizona, USA.
Gynecol Oncol
October 2024
Gynecology Service, Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA; Department of Obstetrics and Gynecology, Weill Cornell Medical College, New York, NY, USA. Electronic address:
J Am Coll Radiol
February 2024
Associate Professor, Department of Radiology, Section Chief, Abdominal Imaging, and Medical Director, Radiology Practice and Operational Excellence, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania.
Purpose: To assess ChatGPT's accuracy, relevance, and readability in answering patients' common imaging-related questions and examine the effect of a simple prompt.
Methods: A total of 22 imaging-related questions were developed from categories previously described as important to patients, as follows: safety, the radiology report, the procedure, preparation before imaging, meaning of terms, and medical staff. These questions were posed to ChatGPT with and without a short prompt instructing the model to provide an accurate and easy-to-understand response for the average person.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!