Evaluation of Feature Selection Techniques for Breast Cancer Risk Prediction.

Int J Environ Res Public Health

Department of Electrical, Systems and Automatic Engineering, Universidad of León, Campus de Vegazana s/n, 24071 León, Spain.

Published: October 2021

This study evaluates several feature ranking techniques together with some classifiers based on machine learning to identify relevant factors regarding the probability of contracting breast cancer and improve the performance of risk prediction models for breast cancer in a healthy population. The dataset with 919 cases and 946 controls comes from the MCC-Spain study and includes only environmental and genetic features. Breast cancer is a major public health problem. Our aim is to analyze which factors in the cancer risk prediction model are the most important for breast cancer prediction. Likewise, quantifying the stability of feature selection methods becomes essential before trying to gain insight into the data. This paper assesses several feature selection algorithms in terms of performance for a set of predictive models. Furthermore, their robustness is quantified to analyze both the similarity between the feature selection rankings and their own stability. The ranking provided by the SVM-RFE approach leads to the best performance in terms of the area under the ROC curve (AUC) metric. Top-47 ranked features obtained with this approach fed to the Logistic Regression classifier achieve an AUC = 0.616. This means an improvement of 5.8% in comparison with the full feature set. Furthermore, the SVM-RFE ranking technique turned out to be highly stable (as well as Random Forest), whereas relief and the wrapper approaches are quite unstable. This study demonstrates that the stability and performance of the model should be studied together as Random Forest and SVM-RFE turned out to be the most stable algorithms, but in terms of model performance SVM-RFE outperforms Random Forest.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8535206PMC
http://dx.doi.org/10.3390/ijerph182010670DOI Listing

Publication Analysis

Top Keywords

breast cancer
20
feature selection
16
risk prediction
12
random forest
12
cancer risk
8
algorithms terms
8
cancer
6
breast
5
feature
5
performance
5

Similar Publications

Background: Screening of asymptomatic stage IV breast cancer with brain MRIs is currently not recommended by National Comprehensive Cancer Network (NCCN) Guidelines. The incidence of asymptomatic brain metastasis is not well documented.

Methods: The study is designed as a single arm, phase II trial, with the goal of investigating surveillance brain MRIs in neurologically asymptomatic patients with metastatic breast cancer.

View Article and Find Full Text PDF

Today, cancer has become one of the leading global tragedies. It occurs when a small number of cells in the body mutate, causing some of them to evade the body's immune system and proliferate uncontrollably. Even more irritating is the fact that patients with cancers frequently relapse after conventional chemotherapy and radiotherapy, leading to additional suffering.

View Article and Find Full Text PDF

Background: Colon adenocarcinoma (COAD) is a malignancy with a high mortality rate and complex biological characteristics and heterogeneity, which poses challenges for clinical treatment. Anoikis is a type of programmed cell death that occurs when cells lose their attachment to the extracellular matrix (ECM), and it plays a crucial role in tumor metastasis. However, the specific biological link between anoikis and COAD, as well as its mechanisms in tumor progression, remains unclear, making it a potential new direction for therapeutic strategy research.

View Article and Find Full Text PDF

Introduction: Oncolytic herpes simplex viruses (oHSVs) are a type of biotherapeutic utilized in cancer therapy due to their ability to selectively infect and destroy tumor cells without harming healthy cells. We sought to investigate the functional genomic response and altered metabolic pathways of human cancer cells to oHSV-1 infection and to elucidate the influence of these responses on the relationship between the virus and the cancer cells.

Methods: Two datasets containing gene expression profiles of tumor cells infected with oHSV-1 (G207) and non-infected cells from the Gene Expression Omnibus (GEO) database were processed and normalized using the R software.

View Article and Find Full Text PDF

Background: Breast cancer remains a leading cause of mortality among women, driven by the molecular complexity of its various subtypes. This study aimed to investigate the differential expression of genes and miRNAs involved in the PI3K/AKT/mTOR signaling pathway, a critical regulator of cancer progression.

Methods: We analyzed tumor tissues from five breast cancer subtypes-luminal A, luminal B HER2-negative, luminal B HER2-positive, HER2-positive, and triple-negative breast cancer (TNBC)-and compared them with non-cancerous tissues.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!