RENANO: a REference-based compressor for NANOpore FASTQ files.

Bioinformatics

Facultad de Ingeniería, Universidad de la República, Montevideo, 11300, Uruguay.

Published: December 2021

Motivation: Nanopore sequencing technologies are rapidly gaining popularity, in part, due to the massive amounts of genomic data they produce in short periods of time (up to 8.5 TB of data in <72 h). To reduce the costs of transmission and storage, efficient compression methods for this type of data are needed.

Results: We introduce RENANO, a reference-based lossless data compressor specifically tailored to FASTQ files generated with nanopore sequencing technologies. RENANO improves on its predecessor ENANO, currently the state of the art, by providing a more efficient base call sequence compression component. Two compression algorithms are introduced, corresponding to the following scenarios: (1) a reference genome is available without cost to both the compressor and the decompressor and (2) the reference genome is available only on the compressor side, and a compacted version of the reference is included in the compressed file. We compare the compression performance of RENANO against ENANO on several publicly available nanopore datasets. RENANO improves the base call sequences compression of ENANO by 39.8% in scenario (1), and by 33.5% in scenario (2), on average, over all the datasets. As for total file compression, the average improvements are 12.7% and 10.6%, respectively. We also show that RENANO consistently outperforms the recent general-purpose genomic compressor Genozip.

Availability And Implementation: RENANO is freely available for download at: https://github.com/guilledufort/RENANO.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btab437DOI Listing

Publication Analysis

Top Keywords

renano reference-based
4
reference-based compressor
4
compressor nanopore
4
nanopore fastq
4
fastq files
4
files motivation
4
motivation nanopore
4
nanopore sequencing
4
sequencing technologies
4
technologies rapidly
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!