Since the completion of the Human Genome Project at the turn of the century, there has been an unprecedented proliferation of sequencing data. One of the consequences is that it becomes extremely difficult to store, backup, and migrate enormous amount of genomic datasets, not to mention they continue to expand as the cost of sequencing decreases. Herein, a much more efficient and scalable program to perform genome compression is required urgently. In this manuscript, we propose a new Apache Spark based Genome Compression method called SparkGC that can run efficiently and cost-effectively on a scalable computational cluster to compress large collections of genomes. SparkGC uses Spark's in-memory computation capabilities to reduce compression time by keeping data active in memory between the first-order and second-order compression. The evaluation shows that the compression ratio of SparkGC is better than the best state-of-the-art methods, at least better by 30%. The compression speed is also at least 3.8 times that of the best state-of-the-art methods on only one worker node and scales quite well with the number of nodes. SparkGC is of significant benefit to genomic data storage and transmission. The source code of SparkGC is publicly available at https://github.com/haichangyao/SparkGC .
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310413 | PMC |
http://dx.doi.org/10.1186/s12859-022-04825-5 | DOI Listing |
J Cell Sci
January 2025
Department of Biomedical Engineering, Northwestern University, Evanston, Illinois, 60208, USA.
Disrupted nuclear shape is associated with multiple pathological processes including premature aging disorders, cancer-relevant chromosomal rearrangements, and DNA damage. Nuclear blebs (i.e.
View Article and Find Full Text PDFbioRxiv
January 2025
Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, 92093.
Deep learning sequence models trained on personalized genomics can improve variant effect prediction, however, applications of these models are limited by computational requirements for storing and reading large datasets. We address this with GenVarLoader, which stores personalized genomic data in new memory-mapped formats with optimal data locality to achieve ~1,000x faster throughput and ~2,000x better compression compared to existing alternatives.
View Article and Find Full Text PDFFoods
January 2025
College of Food Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
The inherent physico-chemical properties of commercial konjac powders often limited their application across various industries. While existing modification techniques had produced konjac powders with diverse physical attributes, these methods were frequently associated with high costs and environmental concerns. Hence, there was a critical need to develop a cost-effective, environmentally friendly, and straightforward method for modifying konjac powders.
View Article and Find Full Text PDFACS Agric Sci Technol
January 2025
Laboratory of Organic Electronics, Department of Science and Technology, Linköping University, SE-60174 Norrköping, Sweden.
Plant infiltration techniques, particularly agroinfiltration, have transformed plant science and biotechnology by enabling transient gene expression for genetic engineering of plants or genomic studies. Recently, the use of infiltration has expanded to introduce nanomaterials and polymers in plants to enable nonnative functionalities. Despite its wide use, the impact of the infiltration process on plant physiology needs to be better understood.
View Article and Find Full Text PDFJBMR Plus
February 2025
Radiology and Imaging Sciences, National Institutes of Health Clinical Center, National Institutes of Health, Bethesda, MD 20892, United States.
Jansen metaphyseal chondrodysplasia (JMC) is an ultra-rare disorder caused by constitutive activation of parathyroid hormone type 1 receptor (PTH1R). We sought to characterize the craniofacial phenotype of patients with the disease. Six patients with genetically confirmed JMC underwent comprehensive craniofacial phenotyping revealing a distinct facial appearance that prompted a cephalometric analysis demonstrating a pattern of mandibular retrognathia.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!