Using a decision support system (DSS) that classifies various cancers provides support to the clinicians/researchers to make better decisions that can aid in early cancer diagnosis, thereby reducing chances of incorrect disease diagnosis. Thus, this work aimed at designing a classification model that can predict accurately for 5 different cancer types comprising of 20 cancer exomes, using the mutations identified from whole exome cancer analysis. Initially, a basic model was designed using supervised machine learning classification algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), decision tree, naïve bayes and random forest (RF), among which decision tree and random forest performed better in terms of preliminary model accuracy. However, output predictions were incorrect due to less training scores. Thus, 16 essential features were then selected for model improvement using 2 approaches. All imbalanced datasets were balanced using SMOTE. In the first approach, all features from 20 cancer exome datasets were trained and models were designed using decision tree and random forest. Balanced datasets for decision tree model showed an accuracy of 77%, while with the RF model, the accuracy improved to 82% where all 5 cancer types were predicted correctly. Area under the curve for RF model was closer to 1, than decision tree model. In the second approach, all 15 datasets were trained, while 5 were tested. However, only 2 cancer types were predicted correctly. To cross validate RF model, Matthew's correlation co-efficient (MCC) test was performed. For method 1, the MCC test and MCC cross validation was found to be 0.7796 and 0.9356 respectively. Likewise, for second approach, MCC was observed to be 0.9365, corroborating the accuracy of the designed model. The model was successfully deployed using Streamlit as a web application for easy use. This study presents insights for allowing easy cancer classifications.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9880585PMC
http://dx.doi.org/10.1177/11769351221147244DOI Listing

Publication Analysis

Top Keywords

decision tree
20
cancer types
12
random forest
12
model accuracy
12
model
11
cancer
9
decision support
8
support system
8
supervised machine
8
machine learning
8

Similar Publications

Aim: Identify values that could predict the presence of increased pressure-pain sensitivity independent of the migraine cycle through a single assessment.

Methods: This was a secondary analysis of a previous study in which 198 episodic and chronic migraine patients were assessed during all phases of the migraine cycle. Pressure pain threshold (PPT) was assessed over the temporalis, cervical spine, hand, and leg.

View Article and Find Full Text PDF

Cytologically indeterminate thyroid nodules (Bethesda class III or IV) carry a 10-40% risk of malignancy. Diagnostic lobectomies are frequently performed but negative surgeries incur unnecessary costs on the healthcare system, potential complications, and negative impacts on quality of life. Molecular tests (MTs) have been developed to reduce unnecessary surgeries.

View Article and Find Full Text PDF

Background: The cotton jassid, Amrasca biguttula, a dangerous and polyphagous pest, has recently invaded the Middle East, Africa and South America, raising concerns about the future of cotton and other food crops including okra, eggplant and potato. However, its potential distribution remains largely unknown, posing a challenge in developing effective phytosanitary strategies. We used an ensemble model of six machine-learning algorithms including random forest, maxent, support vector machines, classification and regression tree, generalized linear model and boosted regression trees to forecast the potential distribution of A.

View Article and Find Full Text PDF

Background: Sepsis is a life-threatening disease associated with a high mortality rate, emphasizing the need for the exploration of novel models to predict the prognosis of this patient population. This study compared the performance of traditional logistic regression and machine learning models in predicting adult sepsis mortality.

Objective: To develop an optimum model for predicting the mortality of adult sepsis patients based on comparing traditional logistic regression and machine learning methodology.

View Article and Find Full Text PDF

Context: Tendon abnormalities on imaging are commonly observed in individuals with Achilles tendinopathy. Those abnormalities can also be present in asymptomatic individuals, which is an important risk factor for developing tendon symptoms. Ballet dancers are particularly vulnerable due to the high loads placed on their Achilles tendons.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!