With the increasing throughput of modern sequencing instruments, the cost of storing and transmitting sequencing data has also increased dramatically. Although many tools have been developed to compress sequencing data, there is still a need to develop a compressor with a higher compression ratio. We present a two-step framework for compressing sequencing data in this paper. The first step is to repack original data into a binary stream, while the second step is to compress the stream with a LZMA encoder. We develop a new strategy to encode the original file into a LZMA highly compressed stream. In addition an FPGA-accelerated of LZMA was implemented to speedup the second step. As a demonstration, we present repaq as a lossless non-reference compressor of FASTQ format files. We introduced a multifile redundancy elimination method, which is very useful for compressing paired-end sequencing data. According to our test results, the compression ratio of repaq is much higher than other FASTQ compressors. For some deep sequencing data, the compression ratio of repaq can be higher than 25, almost four times of Gzip. The framework presented in this paper can also be applied to develop new tools for compressing other sequencing data. The open-source code of repaq is available at: https://github.com/OpenGene/repaq.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552150 | PMC |
http://dx.doi.org/10.3389/fgene.2023.1260531 | DOI Listing |
Int J Syst Evol Microbiol
January 2025
China National Research Institute of Food & Fermentation Industries, China Center of Industrial Culture Collection, Beijing 100015, PR China.
is widely used as a starter culture in the production of cheese, yoghurt and various cultured dairy products, which holds considerable significance in both research and practical applications within the food industry. Throughout history, the taxonomy of has undergone several adjustments and revisions. In 1984, based on the result of DNA-DNA hybridization, was reclassified as subsp.
View Article and Find Full Text PDFComput Methods Biomech Biomed Engin
January 2025
Department of Gastroenterolgy, The Second Affiliated Hospital of Chengdu Medical College, China National Nuclear Corporation 416 Hospital, Chengdu, China.
The global rise in Crohn's Disease (CD) incidence has intensified diagnostic challenges. This study identified circadian rhythm-related biomarkers for CD using datasets from the GEO database. Differentially expressed genes underwent Weighted Gene Co-Expression Network Analysis, with 49 hub genes intersected from GeneCards data.
View Article and Find Full Text PDFInfection
January 2025
Department of Clinical Infectious Diseases, Research Center Borstel, Leibniz Lung Center, Parkallee 35, Borstel, Germany.
Purpose: Deciding whether to provide preventive treatment to contacts of individuals with multidrug-resistant (MDR) tuberculosis is complex.
Methods: We present the diagnostic pathways, clinical course and outcome of tuberculosis treatment in eight siblings from a single family. Tuberculosis disease was diagnosed by Mycobacterium tuberculosis culture and molecular detection of M.
Aging Clin Exp Res
January 2025
Department of Joint Surgery, HongHui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi, 710054, China.
Objective: Osteoarthritis (OA) represents a condition under the influence of central nervous system (CNS) regulatory mechanisms. This investigation aims to examine the causal association between viral infections of the central nervous system (VICNS) and inflammatory diseases of the central nervous system (IDCNS) and knee osteoarthritis (KOA) at the genetic level.
Methods: In this investigation, VICNS and IDCNS were considered as primary exposure variables, while KOA served as the primary outcome.
Neuroinformatics
January 2025
Laboratory for Applied Genomics and Bioinnovations, Instituto Oswaldo Cruz - Fiocruz, Rio de Janeiro, RJ, Brazil.
Multiple sclerosis (MS) is a neurological disease causing myelin and axon damage through inflammatory and autoimmune processes. Despite affecting millions worldwide, understanding its genetic pathways remains limited. The choroid plexus (ChP) has been studied in neurodegenerative processes and diseases like MS due to its dysregulation, yet its role in MS pathophysiology remains unclear.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!