The most effective genomic data compression methods either assemble reads into contigs, or replace them with their alignment positions on a reference genome. Such methods require significant computational resources, but faster alternatives that avoid using explicit or de novo-constructed references fail to match their performance. Here, we introduce a new reference-free compressed representation for genomic data based on light de novo assembly of reads, where each read is represented as a node in a (compact) trie. We show how to efficiently build such tries to compactly represent reads and demonstrate that among all methods using this representation (including all de novo assembly based methods), our method achieves the shortest possible output. We also provide an lower bound on the compression rate achievable on uniformly sampled genomic read data, which is approximated by our method well. Our method significantly improves the compression performance of alternatives without compromising speed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5805770PMC
http://dx.doi.org/10.1038/s41467-017-02480-6DOI Listing

Publication Analysis

Top Keywords

compressed representation
8
genomic data
8
novo assembly
8
optimal compressed
4
representation high
4
high throughput
4
throughput sequence
4
data
4
sequence data
4
data light
4

Similar Publications

Background: In this study, the unconfined compressive strength (q) of a mixture consisting of clay reinforced with 24 mm-long basalt fiber was estimated using extreme learning machine (ELM). The aim of this study is to estimate the results closest to the data obtained through experimental studies without the need for experimental studies. The literature review reveals that the ELM technique has not been applied to predict the compressive strength of basalt fiber-reinforced clay, and this study aims to provide a novel contribution in this area.

View Article and Find Full Text PDF

Introduction: Subsea applications recently received increasing attention due to the global expansion of offshore energy, seabed infrastructure, and maritime activities; complex inspection, maintenance, and repair tasks in this domain are regularly solved with pilot-controlled, tethered remote-operated vehicles to reduce the use of human divers. However, collecting and precisely labeling submerged data is challenging due to uncontrollable and harsh environmental factors. As an alternative, synthetic environments offer cost-effective, controlled alternatives to real-world operations, with access to detailed ground-truth data.

View Article and Find Full Text PDF

Amplitude compression is an indispensable feature of contemporary audio production and especially relevant in modern hearing aids. The cortical fate of amplitude-compressed speech signals is not well-studied, however, and may yield undesired side effects: We hypothesize that compressing the amplitude envelope of continuous speech reduces neural tracking. Yet, leveraging such a 'compression side effect' on unwanted, distracting sounds could potentially support attentive listening if effectively reducing their neural tracking.

View Article and Find Full Text PDF

PEDRA-EFB0: colorectal cancer prognostication using deep learning with patch embeddings and dual residual attention.

Med Biol Eng Comput

January 2025

Radiol Dept, Jiangnan Univ, Affiliated Hosp, Wuxi, 214122, Jiangsu, People's Republic of China.

In computer-aided diagnosis systems, precise feature extraction from CT scans of colorectal cancer using deep learning is essential for effective prognosis. However, existing convolutional neural networks struggle to capture long-range dependencies and contextual information, resulting in incomplete CT feature extraction. To address this, the PEDRA-EFB0 architecture integrates patch embeddings and a dual residual attention mechanism for enhanced feature extraction and survival prediction in colorectal cancer CT scans.

View Article and Find Full Text PDF

Optimal sparsity in autoencoder memory models of the hippocampus.

bioRxiv

January 2025

Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY.

Storing complex correlated memories is significantly more efficient when memories are recoded to obtain compressed representations. Previous work has shown that compression can be implemented in a simple neural circuit, which can be described as a sparse autoencoder. The activity of the encoding units in these models recapitulates the activity of hippocampal neurons recorded in multiple experiments.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!