JARVIS3: an efficient encoder for genomic data.

Bioinformatics

Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, 3810-193 Aveiro, Portugal.

Published: November 2024

Motivation: Large-scale genomic projects grapple with the complex challenge of reducing medium- and long-term storage space and its associated energy consumption, monetary costs, and environmental footprint.

Results: We present JARVIS3, an advanced tool engineered for the efficient reference-free compression of genomic sequences. JARVIS3 introduces a pioneering approach, specifically through enhanced table memory models and probabilistic lookup-tables applied in repeat models. These optimizations are pivotal in substantially enhancing computational efficiency. JARVIS3 offers three distinct profiles: (i) rapid computation with moderate compression, (ii) a balanced trade-off between time and compression, and (iii) slower computation with significantly higher compression ratios. The implementation of JARVIS3 is rooted in the C programming language, building upon the success of its predecessor, JARVIS2. JARVIS3 shows substantial speed improvements relative to JARVIS2 while providing slightly better compression. Furthermore, we provide a versatile C/Bash implementation, facilitating the application in FASTA and FASTQ data, including the capability for parallel computation. In addition, JARVIS3 includes a mode for outputting bit information, as well as providing the Normalized Compression and bit rates, facilitating compression-based analysis. This establishes JARVIS3 as an open-source solution for genomic data compression and analysis.

Availability And Implementation: JARVIS3 is freely available at https://github.com/cobilab/jarvis3.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11645547PMC
http://dx.doi.org/10.1093/bioinformatics/btae725DOI Listing

Publication Analysis

Top Keywords

jarvis3
9
genomic data
8
implementation jarvis3
8
compression
7
jarvis3 efficient
4
efficient encoder
4
genomic
4
encoder genomic
4
data motivation
4
motivation large-scale
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!