Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11046777PMC
http://dx.doi.org/10.1186/s13059-024-03244-4DOI Listing

Publication Analysis

Top Keywords

lossless compression
8
run-block compression
8
centrifuger
5
compression
5
centrifuger lossless
4
compression microbial
4
microbial genomes
4
genomes efficient
4
efficient accurate
4
accurate metagenomic
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!