kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.

Peng Jiang Jie Luo Yiqi Wang Pingji Deng Bertil Schmidt Xiangjun Tang Ningjiang Chen Limsoon Wong Liang Zhao

Bioinformatics

Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, Hubei, China.

Published: December 2019

Motivation: K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability.

Results: We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays-one for k-mer representation and the other for frequency encoding. Experiments on five real datasets show that the average memory-saving ratio on all 31-mers is as high as 13.81 as compared with raw input, with 7 hash functions. At the same time, the retrieval time complexity is well controlled (effectively constant), and the false-positive rate is decreased by two orders of magnitude.

Availability And Implementation: The source codes of our algorithm are available at github.com/lzhLab/kmcEx.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btz299	DOI Listing

Publication Analysis

Top Keywords

counted k-mers

kmcex memory-frugal

memory-frugal retrieval-efficient

retrieval-efficient encoding

encoding counted

k-mers

k-mers motivation

motivation k-mers

k-mers frequency

frequency served

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!