In data analysis, data scientists usually focus on the size of data instead of features selection. Owing to the extreme growth of internet resources data are growing exponentially with more features, which leads to big data dimensionality problems. The high volume of features contains much of redundant data, which may affect the feature classification in terms of accuracy. In the current scenario, feature selection attracts the research community to identify and to remove irrelevant features with more scalability and accuracy. To accommodate this, in this research study, we present a novel feature selection framework that is implemented on Hadoop and Apache Spark platform. In contrast, the proposed model also includes rough sets and differential evolution (DE) algorithm, where rough sets are used to find the minimum features, but rough sets do not consider the degree of overlying in the data. Therefore, DE algorithm is used to find the most optimal features. The proposed model is studied with Random Forest and Naive Bayes classifiers on five well-known data sets and compared with existing feature selection models presented in the literature. The results show that the proposed model performs well in terms of scalability and accuracy.

Download full-text PDF

Source
http://dx.doi.org/10.1089/big.2021.0267DOI Listing

Publication Analysis

Top Keywords

feature selection
16
proposed model
12
rough sets
12
data
9
differential evolution
8
evolution algorithm
8
big data
8
scalability accuracy
8
features
6
feature
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!