Simple strategies for semi-supervised feature selection.

Mach Learn

School of Computer Science, University of Manchester, Manchester, M13 9PL UK.

Published: July 2017

What is the simplest thing you can do to solve a problem? In the context of semi-supervised feature selection, we tackle exactly this-how much we can gain from two simple strategies. If we have some binary labelled data and some unlabelled, we could assume the unlabelled data are all positives, or assume them all negatives. These minimalist, seemingly naive, approaches have not previously been studied in depth. However, with theoretical and empirical studies, we show they provide powerful results for feature selection, via hypothesis testing and feature ranking. Combining them with some "soft" prior knowledge of the domain, we derive two novel algorithms (-JMI, -IAMB) that outperform significantly more complex competing methods, showing particularly good performance when the labels are missing-not-at-random. We conclude that simple approaches to this problem can work surprisingly well, and in many situations we can provably recover the exact feature selection dynamics, .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6954040PMC
http://dx.doi.org/10.1007/s10994-017-5648-2DOI Listing

Publication Analysis

Top Keywords

feature selection
16
simple strategies
8
semi-supervised feature
8
feature
5
strategies semi-supervised
4
selection
4
selection simplest
4
simplest thing
4
thing solve
4
solve problem?
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!