Data Set-Adaptive Minimizer Order Reduces Memory Usage in -Mer Counting.

J Comput Biol

Blavatnik School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel.

Published: August 2022

The rapid continuous growth of deep sequencing experiments requires development and improvement of many bioinformatic applications for analysis of large sequencing data sets, including -mer counting and assembly. Several applications reduce memory usage by binning sequences. Binning is done by using minimizer schemes, which rely on a specific order of the minimizers. It has been demonstrated that the choice of the order has a major impact on the performance of the applications. Here we introduce a method for tailoring the order to the data set. Our method repeatedly samples the data set and modifies the order so as to flatten the -mer load distribution across minimizers. We integrated our method into Gerbil, a state-of-the-art memory-efficient -mer counter, and were able to reduce its memory footprint by 30%-50% for large , with only a minor increase in runtime. Our tests also showed that the orders produced by our method produced superior results when transferred across data sets from the same species, with little or no order change. This enables memory reduction with essentially no increase in runtime.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2021.0599DOI Listing

Publication Analysis

Top Keywords

memory usage
8
-mer counting
8
data sets
8
reduce memory
8
data set
8
increase runtime
8
order
6
data
5
data set-adaptive
4
set-adaptive minimizer
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!