Since the completion of the Human Genome Project at the turn of the century, there has been an unprecedented proliferation of sequencing data. One of the consequences is that it becomes extremely difficult to store, backup, and migrate enormous amount of genomic datasets, not to mention they continue to expand as the cost of sequencing decreases. Herein, a much more efficient and scalable program to perform genome compression is required urgently. In this manuscript, we propose a new Apache Spark based Genome Compression method called SparkGC that can run efficiently and cost-effectively on a scalable computational cluster to compress large collections of genomes. SparkGC uses Spark's in-memory computation capabilities to reduce compression time by keeping data active in memory between the first-order and second-order compression. The evaluation shows that the compression ratio of SparkGC is better than the best state-of-the-art methods, at least better by 30%. The compression speed is also at least 3.8 times that of the best state-of-the-art methods on only one worker node and scales quite well with the number of nodes. SparkGC is of significant benefit to genomic data storage and transmission. The source code of SparkGC is publicly available at https://github.com/haichangyao/SparkGC .

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310413PMC
http://dx.doi.org/10.1186/s12859-022-04825-5DOI Listing

Publication Analysis

Top Keywords

genome compression
12
spark based
8
based genome
8
large collections
8
collections genomes
8
best state-of-the-art
8
state-of-the-art methods
8
compression
7
sparkgc
6
sparkgc spark
4

Similar Publications

Disrupted nuclear shape is associated with multiple pathological processes including premature aging disorders, cancer-relevant chromosomal rearrangements, and DNA damage. Nuclear blebs (i.e.

View Article and Find Full Text PDF

Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve ~1,000x faster throughput and ~2,000x better compression compared to existing alternatives.

View Article and Find Full Text PDF

The inherent physico-chemical properties of commercial konjac powders often limited their application across various industries. While existing modification techniques had produced konjac powders with diverse physical attributes, these methods were frequently associated with high costs and environmental concerns. Hence, there was a critical need to develop a cost-effective, environmentally friendly, and straightforward method for modifying konjac powders.

View Article and Find Full Text PDF

Investigating the Effect of Syringe Infiltration on (Tobacco).

ACS Agric Sci Technol

January 2025

Laboratory of Organic Electronics, Department of Science and Technology, Linköping University, SE-60174 Norrköping, Sweden.

Plant infiltration techniques, particularly agroinfiltration, have transformed plant science and biotechnology by enabling transient gene expression for genetic engineering of plants or genomic studies. Recently, the use of infiltration has expanded to introduce nanomaterials and polymers in plants to enable nonnative functionalities. Despite its wide use, the impact of the infiltration process on plant physiology needs to be better understood.

View Article and Find Full Text PDF

Jansen metaphyseal chondrodysplasia (JMC) is an ultra-rare disorder caused by constitutive activation of parathyroid hormone type 1 receptor (PTH1R). We sought to characterize the craniofacial phenotype of patients with the disease. Six patients with genetically confirmed JMC underwent comprehensive craniofacial phenotyping revealing a distinct facial appearance that prompted a cephalometric analysis demonstrating a pattern of mandibular retrognathia.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!