SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering.

IEEE Trans Cybern

Published: April 2013

Information-theoretic clustering aims to exploit information-theoretic measures as the clustering criteria. A common practice on this topic is the so-called Info-Kmeans, which performs K-means clustering with KL-divergence as the proximity function. While expert efforts on Info-Kmeans have shown promising results, a remaining challenge is to deal with high-dimensional sparse data such as text corpora. Indeed, it is possible that the centroids contain many zero-value features for high-dimensional text vectors, which leads to infinite KL-divergence values and creates a dilemma in assigning objects to centroids during the iteration process of Info-Kmeans. To meet this challenge, in this paper, we propose a Summation-bAsed Incremental Learning (SAIL) algorithm for Info-Kmeans clustering. Specifically, by using an equivalent objective function, SAIL replaces the computation of KL-divergence by the incremental computation of Shannon entropy. This can avoid the zero-feature dilemma caused by the use of KL-divergence. To improve the clustering quality, we further introduce the variable neighborhood search scheme and propose the V-SAIL algorithm, which is then accelerated by a multithreaded scheme in PV-SAIL. Our experimental results on various real-world text collections have shown that, with SAIL as a booster, the clustering performance of Info-Kmeans can be significantly improved. Also, V-SAIL and PV-SAIL indeed help improve the clustering quality at a lower cost of computation.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TSMCB.2012.2212430	DOI Listing

Publication Analysis

Top Keywords

summation-based incremental

incremental learning

clustering

improve clustering

clustering quality

info-kmeans

sail

sail summation-based

learning information-theoretic

text

Similar Publications

SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering.

IEEE Trans Cybern

April 2013

Jie Cao Zhiang Wu Junjie Wu Hui Xiong

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!