A greedy feature selection algorithm for Big Data of high dimensionality.

Ioannis Tsamardinos Giorgos Borboudakis Pavlos Katsogridakis Polyvios Pratikakis Vassilis Christophides

Mach Learn

1Computer Science Department, University of Crete, Heraklion, Greece.

Published: August 2018

We present the (PFBP) algorithm for (FS) for Big Data of high dimensionality. PFBP partitions the data matrix both in terms of rows as well as columns. By employing the concepts of -values of conditional independence tests and meta-analysis techniques, PFBP relies only on computations local to a partition while minimizing communication costs, thus massively parallelizing computations. Similar techniques for combining local computations are also employed to create the final predictive model. PFBP employs asymptotically sound heuristics to make early, approximate decisions, such as of features from consideration in subsequent iterations, of consideration of features within the same iteration, or of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions representable by a causal network (Bayesian network or maximal ancestral graph). Empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores. An extensive comparative evaluation also demonstrates the effectiveness of PFBP against other algorithms in its class. The heuristics presented are general and could potentially be employed to other greedy-type of FS algorithms. An application on simulated Single Nucleotide Polymorphism (SNP) data with 500K samples is provided as a use case.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6399683	PMC
http://dx.doi.org/10.1007/s10994-018-5748-7	DOI Listing

Publication Analysis

Top Keywords

algorithm big

big data

data high

high dimensionality

dimensionality pfbp

pfbp

data

greedy feature

feature selection

selection algorithm

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered