Background: Breast cancer-related lymphedema (BCRL) is one of the common complications after breast cancer surgery. It can easily lead to limb swelling, deformation and upper limb dysfunction, which has a serious impact on the physical and mental health and quality of life of patients. Previous studies have mostly used statistical methods such as linear regression and logistic regression to analyze the influencing factors, but all of them have certain limitations. Machine learning (ML) is an important branch of artificial intelligence, which can effectively overcome the problems of multivariate interaction and collinearity. This study aimed to explore the influencing factors for the occurrence of BCRL in breast cancer patients, and construct a predictive model with ML algorithms and validate its predictive value on this basis.

Methods: Clinical data of breast cancer patients admitted to Hainan Cancer Hospital from September 2018 to May 2024 were retrospectively collected. BCRL was considered as the outcome measurement, and the data were divided into training and validation sets in a ratio of 7:3. In the training set, random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) algorithms were used to construct predictive models. The discrimination accuracy of the models was evaluated with receiver operating characteristic (ROC) curve analysis, sensitivity, specificity, and F1 score. The calibration of the models was assessed using calibration curves and the Hosmer-Lemeshow (H-L) Chi-squared test.

Results: Two hundred and forty patients who met the inclusion criteria were screened, and they were randomly divided into a training set (168 patients) and a validation set (72 patients) in a 7:3 ratio. In the training set, 44 cases developed BCRL, while 124 did not. There were statistically significant differences (P<0.05) in hypertension history, number of dissected lymph nodes, postoperative complications, postoperative functional exercises, chemotherapy, radiotherapy, tumor node metastasis (TNM) stage, and level of axillary lymph node dissection between the BCRL and non-BCRL groups. Among the four models, the XGBoost model showed the best predictive performance, with an area under the curve (AUC) of 0.99 in the training set and 0.89 in the validation set. The XGBoost model demonstrated good calibration in both the training and validation sets, showing good consistency with the ideal model.

Conclusions: The ML-based XGBoost model for predicting BCRL exhibits excellent performance and assists healthcare professionals in rapidly and accurately assessing the risk of BCRL occurrence.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11733644PMC
http://dx.doi.org/10.21037/gs-24-252DOI Listing

Publication Analysis

Top Keywords

breast cancer
12
training set
12
breast cancer-related
8
cancer-related lymphedema
8
machine learning
8
influencing factors
8
cancer patients
8
construct predictive
8
divided training
8
ratio training
8

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!