Exploiting topic modeling to boost metagenomic reads binning.

Ruichang Zhang Zhanzhan Cheng Jihong Guan Shuigeng Zhou

BMC Bioinformatics

Published: July 2015

Background: With the rapid development of high-throughput technologies, researchers can sequence the whole metagenome of a microbial community sampled directly from the environment. The assignment of these metagenomic reads into different species or taxonomical classes is a vital step for metagenomic analysis, which is referred to as binning of metagenomic data.

Results: In this paper, we propose a new method TM-MCluster for binning metagenomic reads. First, we represent each metagenomic read as a set of "k-mers" with their frequencies occurring in the read. Then, we employ a probabilistic topic model -- the Latent Dirichlet Allocation (LDA) model to the reads, which generates a number of hidden "topics" such that each read can be represented by a distribution vector of the generated topics. Finally, as in the MCluster method, we apply SKWIC -- a variant of the classical K-means algorithm with automatic feature weighting mechanism to cluster these reads represented by topic distributions.

Conclusions: Experiments show that the new method TM-MCluster outperforms major existing methods, including AbundanceBin, MetaCluster 3.0/5.0 and MCluster. This result indicates that the exploitation of topic modeling can effectively improve the binning performance of metagenomic reads.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4402587	PMC
http://dx.doi.org/10.1186/1471-2105-16-S5-S2	DOI Listing

Publication Analysis

Top Keywords

metagenomic reads

topic modeling

binning metagenomic

method tm-mcluster

metagenomic

reads

exploiting topic

modeling boost

boost metagenomic

binning

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!