Objective: The purpose of the current study was to develop and validate a biomarker-based prediction model for metastasis in patients with colorectal cancer (CRC).
Methods: Two datasets, GSE68468 and GSE41568, were retrieved from the Gene Expression Omnibus (GEO) database. In the GSE68468 dataset, key biomarkers were identified through a screening process involving differential expression analysis, redundancy analysis, and recursive feature elimination technique. Subsequently, the prediction model was developed and internally validated using five machine learning (ML) algorithms including lasso and elastic-net regularized generalized linear model (glmnet), k-nearest neighbors (kNN), support vector machine (SVM) with Radial Basis Function Kernel, random forest (RF), and eXtreme Gradient Boosting (XGBoost). The predictive performance of the algorithm with the highest accuracy was then externally validated on the GSE41568 dataset.
Results: Among 22,283 registered genes in the GSE68468 dataset, the screening process identified 16 key genes including and these genes were used to build the prediction model. On the internal validation dataset, the prediction performance of five ML algorithms was as follows; RF (accuracy = 0.97 and kappa = 0.91), XGBoost (0.93, 0.81), kNN (0.93, 0.81), glmnet (0.93, 0.82) and SVM (0.92, 0.80). Top five biomarkers were and . The RF model exhibited an accuracy of 0.97, a kappa value of 0.92, and an area under the curve (AUC) of 0.99 in the external validation dataset.
Conclusion: The results of this study have identified biomarkers through ML algorithms which help to identify patients with CRC prone to metastasis.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748706 | PMC |
http://dx.doi.org/10.1016/j.heliyon.2024.e41443 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!