We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10251615PMC
http://dx.doi.org/10.1186/s13059-023-02968-zDOI Listing

Publication Analysis

Top Keywords

previous work
12
representation k-mer
8
k-mer sets
8
matchtigs minimum
4
minimum plain
4
plain text
4
representation
4
text representation
4
sets propose
4
propose polynomial
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!