Background: Data mining techniques are used to mine unknown knowledge from huge data. Microarray gene expression (MGE) data plays a major role in predicting type of cancer. But as MGE data is huge in volume, applying traditional data mining approaches is time consuming. Hence parallel programming frameworks like Hadoop, Spark and Mahout are necessary to ease the task of computation.

Objective: Not all the gene expressions are necessary in prediction, it is very essential to select important genes for improving classification accuracy. So feature selection algorithms are parallelized and executed on Spark framework to eliminate unnecessary genes and identify only predictive genes in very less time without affecting prediction accuracy.

Methods: Parallelized hybrid feature selection (HFS) method is proposed to serve the purpose. This method includes parallelized correlation feature subset selection followed by rank-based feature selection methods. The selected subset of genes is evaluated using parallel classification algorithms. The accuracy values obtained are compared with existing rank-weight feature selection, parallelized recursive feature selection methods and also with the values obtained by executing parallelized HFS on DistributedWekaSpark.

Results: The classification accuracy obtained with the proposed parallelized HFS method is 97% and 79% for gastric cancer and childhood leukemia respectively. The proposed parallelized HFS method produced ~ 4% to ~ 15% improvement in classification accuracy when compared with previous methods.

Conclusion: The results reveal the fact that the proposed parallelized feature selection algorithm is scalable to growing medical data and predicts cancer sub-types in lesser time with higher accuracy.

Download full-text PDF

Source
http://dx.doi.org/10.1007/s13258-019-00859-xDOI Listing

Publication Analysis

Top Keywords

feature selection
28
classification accuracy
16
hfs method
12
parallelized hfs
12
proposed parallelized
12
improving classification
8
feature
8
hybrid feature
8
selection
8
microarray gene
8

Similar Publications

Per- and polyfluoroalkyl substances (PFASs) have recently garnered considerable concerns regarding their impacts on human and ecological health. Despite the important roles of polyamide membranes in remediating PFASs-contaminated water, the governing factors influencing PFAS transport across these membranes remain elusive. In this study, we investigate PFAS rejection by polyamide membranes using two machine learning (ML) models, namely XGBoost and multimodal transformer models.

View Article and Find Full Text PDF

There is a pressing need to improve risk stratification and treatment selection for HPV-negative head and neck squamous cell carcinoma (HNSCC) due to the adverse side effects of treatment. One of the most important prognostic features is lymph nodes involvement. Previously, we demonstrated that tumor formation in patient-derived xenografts (i.

View Article and Find Full Text PDF

Modern maize (Zea mays ssp. mays) was domesticated from Teosinte parviglumis (Zea mays ssp. parviglumis), with subsequent introgressions from Teosinte mexicana (Zea mays ssp.

View Article and Find Full Text PDF

The emergence of single-atom catalysts offers exciting prospects for the green production of hydrogen peroxide; however, their optimal local structure and the underlying structure-activity relationships remain unclear. Here we show trace Fe, up to 278 mg/kg and derived from microbial protein, serve as precursors to synthesize a variety of Fe single-atom catalysts containing FeNO (1 ≤ x ≤ 4) moieties through controlled pyrolysis. These moieties resemble the structural features of nonheme Fe-dependent enzymes while being effectively confined on a microbe-derived, electrically conductive carbon support, enabling high-current density electrolysis.

View Article and Find Full Text PDF

Steering acidic oxygen reduction selectivity of single-atom catalysts through the second sphere effect.

Nat Commun

December 2024

Center of Artificial Photosynthesis for Solar Fuels and Department of Chemistry, School of Science, Westlake University, Hangzhou, China.

Natural enzymes feature distinctive second spheres near their active sites, leading to exquisite catalytic reactivity. However, incumbent synthetic strategies offer limited versatility in functionalizing the second spheres of heterogeneous catalysts. Here, we prepare an enzyme-mimetic single Co-N atom catalyst with an elaborately configured pendant amine group in the second sphere via 1,3-dipolar cycloaddition, which switches the oxygen reduction reaction selectivity from the 4e to the 2e pathway under acidic conditions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!