Breast cancer data analysis for survivability studies and prediction.

Comput Methods Programs Biomed

SMART Infrastructure Facility, Faculty of Engineering and Information Sciences, University of Wollongong, Wollongong, NSW 2500, Australia.

Published: March 2018

Background: Breast cancer is the most common cancer affecting females worldwide. Breast cancer survivability prediction is challenging and a complex research task. Existing approaches engage statistical methods or supervised machine learning to assess/predict the survival prospects of patients.

Objective: The main objectives of this paper is to develop a robust data analytical model which can assist in (i) a better understanding of breast cancer survivability in presence of missing data, (ii) providing better insights into factors associated with patient survivability, and (iii) establishing cohorts of patients that share similar properties.

Methods: Unsupervised data mining methods viz. the self-organising map (SOM) and density-based spatial clustering of applications with noise (DBSCAN) is used to create patient cohort clusters. These clusters, with associated patterns, were used to train multilayer perceptron (MLP) model for improved patient survivability analysis. A large dataset available from SEER program is used in this study to identify patterns associated with the survivability of breast cancer patients. Information gain was computed for the purpose of variable selection. All of these methods are data-driven and require little (if any) input from users or experts.

Results: SOM consolidated patients into cohorts of patients with similar properties. From this, DBSCAN identified and extracted nine cohorts (clusters). It is found that patients in each of the nine clusters have different survivability time. The separation of patients into clusters improved the overall survival prediction accuracy based on MLP and revealed intricate conditions that affect the accuracy of a prediction.

Conclusions: A new, entirely data driven approach based on unsupervised learning methods improves understanding and helps identify patterns associated with the survivability of patient. The results of the analysis can be used to segment the historical patient data into clusters or subsets, which share common variable values and survivability. The survivability prediction accuracy of a MLP is improved by using identified patient cohorts as opposed to using raw historical data. Analysis of variable values in each cohort provide better insights into survivability of a particular subgroup of breast cancer patients.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.cmpb.2017.12.011DOI Listing

Publication Analysis

Top Keywords

breast cancer
24
survivability
11
data analysis
8
cancer survivability
8
survivability prediction
8
better insights
8
patient survivability
8
cohorts patients
8
identify patterns
8
patterns associated
8

Similar Publications

Background: Kentucky is within the top five leading states for breast mortality nationwide. This study investigates the association between neighborhood socioeconomic disadvantage and breast cancer outcomes, including surgical treatment, radiation therapy, chemotherapy, and survival, and how associations vary by race and ethnicity in Kentucky.

Methods: We conducted a retrospective cohort analysis using data from the Kentucky Cancer Registry (KCR) for breast cancer patients diagnosed between 2010 and 2017, with follow-up through December 31, 2022.

View Article and Find Full Text PDF

Breast cancers of the IntClust-2 type, characterized by amplification of a small portion of chromosome 11, have a median survival of only five years. Several cancer-relevant genes occupy this portion of chromosome 11, and it is thought that overexpression of a combination of driver genes in this region is responsible for the poor outcome of women in this group. In this study we used a gene editing method to knock out, one by one, each of 198 genes that are located within the amplified region of chromosome 11 and determined how much each of these genes contributed to the survival of breast cancer cells.

View Article and Find Full Text PDF

External Validation of a 5-Factor Risk Model for Breast Cancer-Related Lymphedema.

JAMA Netw Open

January 2025

Institute of Medical Science, Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Importance: Secondary lymphedema is a common, harmful side effect of breast cancer treatment. Robust risk models that are externally validated are needed to facilitate clinical translation. A published risk model used 5 accessible clinical factors to predict the development of breast cancer-related lymphedema; this model included a patient's mammographic breast density as a novel predictive factor.

View Article and Find Full Text PDF

Azo dye was used to prepare a new series of complexes with chlorides of rhodium (Rh), ruthenium (Ru), and corona (Au). The prepared materials were subjected to infrared, ultraviolet-visible, and mass spectrometry, as well as thermogravimetric analysis, differential calorimetry, and elemental analysis. Conductivity, magnetic susceptibility, metal content, and chlorine content of the complexes were also measured.

View Article and Find Full Text PDF

Breastfeeding provides essential nutrition and disease protection for infants while reducing the risk of type 2 diabetes and breast cancer in mothers. Despite these benefits, significant racial and ethnic disparities exist in breastfeeding initiation, particularly among Black women. This study examines racial differences in the receipt of breastfeeding information from varying sources and their association with breastfeeding initiation.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!