Introduction: Large Language Models (LLMs), such as the GPT model family from OpenAI, have demonstrated transformative potential across various fields, especially in medicine. These models can understand and generate contextual text, adapting to new tasks without specific training. This versatility can revolutionize clinical practices by enhancing documentation, patient interaction, and decision-making processes. In oncology, LLMs offer the potential to significantly improve patient care through the continuous monitoring of chemotherapy-induced toxicities, which is a task that is often unmanageable for human resources alone. However, existing research has not sufficiently explored the accuracy of LLMs in identifying and assessing subjective toxicities based on patient descriptions. This study aims to fill this gap by evaluating the ability of LLMs to accurately classify these toxicities, facilitating personalized and continuous patient care.

Methods: This comparative pilot study assessed the ability of an LLM to classify subjective toxicities from chemotherapy. Thirteen oncologists evaluated 30 fictitious cases created using expert knowledge and OpenAI's GPT-4. These evaluations, based on the CTCAE v.5 criteria, were compared to those of a contextualized LLM model. Metrics such as mode and mean of responses were used to gauge consensus. The accuracy of the LLM was analyzed in both general and specific toxicity categories, considering types of errors and false alarms. The study's results are intended to justify further research involving real patients.

Results: The study revealed significant variability in oncologists' evaluations due to the lack of interaction with fictitious patients. The LLM model achieved an accuracy of 85.7% in general categories and 64.6% in specific categories using mean evaluations with mild errors at 96.4% and severe errors at 3.6%. False alarms occurred in 3% of cases. When comparing the LLM's performance to that of expert oncologists, individual accuracy ranged from 66.7% to 89.2% for general categories and 57.0% to 76.0% for specific categories. The 95% confidence intervals for the median accuracy of oncologists were 81.9% to 86.9% for general categories and 67.6% to 75.6% for specific categories. These benchmarks highlight the LLM's potential to achieve expert-level performance in classifying chemotherapy-induced toxicities.

Discussion: The findings demonstrate that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM achieved 85.7% accuracy in general categories and 64.6% in specific categories. While the model's general category performance falls within expert ranges, specific category accuracy requires improvement. The study's limitations include the use of fictitious cases, lack of patient interaction, and reliance on audio transcriptions. Nevertheless, LLMs show significant potential for enhancing patient monitoring and reducing oncologists' workload. Future research should focus on the specific training of LLMs for medical tasks, conducting studies with real patients, implementing interactive evaluations, expanding sample sizes, and ensuring robustness and generalization in diverse clinical settings.

Conclusions: This study concludes that LLMs can classify subjective toxicities from chemotherapy with accuracy comparable to expert oncologists. The LLM's performance in general toxicity categories is within the expert range, but there is room for improvement in specific categories. LLMs have the potential to enhance patient monitoring, enable early interventions, and reduce severe complications, improving care quality and efficiency. Future research should involve specific training of LLMs, validation with real patients, and the incorporation of interactive capabilities for real-time patient interactions. Ethical considerations, including data accuracy, transparency, and privacy, are crucial for the safe integration of LLMs into clinical practice.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11352281PMC
http://dx.doi.org/10.3390/cancers16162830DOI Listing

Publication Analysis

Top Keywords

specific categories
20
subjective toxicities
16
general categories
16
specific training
12
classify subjective
12
toxicities chemotherapy
12
expert oncologists
12
llms
11
categories
11
specific
10

Similar Publications

A corpus of Chinese word segmentation agreement.

Behav Res Methods

December 2024

Department of Education Studies, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong.

The absence of explicit word boundaries is a distinctive characteristic of Chinese script, setting it apart from most alphabetic scripts, leading to word boundary disagreement among readers. Previous studies have examined how this feature may influence reading performance. However, further investigations are required to generate more ecologically valid and generalizable findings.

View Article and Find Full Text PDF

Rural Revitalization (RR) is a key national strategy in China aimed at sustainable rural development and has gained significant attention. Given the unique characteristics of different villages, understanding differentiated paths to achieve RR is essential. This study introduces a new "5I Framework" (INDUS-INHAB-INDOC-INFRA-INCOM) to assess RR's overall development status (ODS) and differentiated paths.

View Article and Find Full Text PDF

Background: While large language models like ChatGPT-4 have demonstrated competency in English, their performance for minority groups speaking underrepresented languages, as well as their ability to adapt to specific socio-cultural nuances and regional cuisines, such as those in Central Asia (e.g., Kazakhstan), still requires further investigation.

View Article and Find Full Text PDF

The rhythm of horse gaits.

Ann N Y Acad Sci

December 2024

Department of Human Neurosciences, Sapienza University of Rome, Rome, Italy.

What makes animal gaits so audibly rhythmic? To answer this question, we recorded the footfall sound of 19 horses and quantified the rhythmic differences in the temporal structure of three natural gaits: walk, trot, and canter. Our analyses show that each gait displays a strikingly specific rhythmic pattern and that all gaits are organized according to small-integer ratios, those found when adjacent temporal intervals are related by a mathematically simple relationship of integer numbers. Walk and trot exhibit an isochronous structure (1:1)-similar to a ticking clock-while canter is characterized by three small-integer ratios (1:1, 1:2, 2:1).

View Article and Find Full Text PDF

This retrospective cohort study aimed to define the optimal Regions of Homozygosity (ROH) size cut-offs for prediction of morbidity, based on 13 483 Chromosomal Microarray Analyses (CMA). Receiver operating characteristic (ROC) curves were generated, and area under the curve (AUC) was used to assess the predictive capability of total ROH percentage (TRPS), ROH number and ROH segment size in distinguishing between healthy (n=6,196) and affected (n=6,839) cohorts. The metrics were examined for telomeric and interstitial segments, distinct TRPS categories, and across different ancestral origins.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!