Optimal compressed representation of high throughput sequence data via light assembly.

Antonio A Ginart Joseph Hui Kaiyuan Zhu Ibrahim Numanagić Thomas A Courtade S Cenk Sahinalp David N Tse

Nat Commun

Department of Electrical Engineering, Stanford University, Stanford, CA, 94305, USA.

Published: February 2018

The most effective genomic data compression methods either assemble reads into contigs, or replace them with their alignment positions on a reference genome. Such methods require significant computational resources, but faster alternatives that avoid using explicit or de novo-constructed references fail to match their performance. Here, we introduce a new reference-free compressed representation for genomic data based on light de novo assembly of reads, where each read is represented as a node in a (compact) trie. We show how to efficiently build such tries to compactly represent reads and demonstrate that among all methods using this representation (including all de novo assembly based methods), our method achieves the shortest possible output. We also provide an lower bound on the compression rate achievable on uniformly sampled genomic read data, which is approximated by our method well. Our method significantly improves the compression performance of alternatives without compromising speed.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5805770	PMC
http://dx.doi.org/10.1038/s41467-017-02480-6	DOI Listing

Publication Analysis

Top Keywords

compressed representation

genomic data

novo assembly

optimal compressed

representation high

high throughput

throughput sequence

data

sequence data

data light

Similar Publications

Estimation of Compressive Strength of Basalt Fiber-Reinforced Kaolin Clay Mixture Using Extreme Learning Machine.

Materials (Basel)

January 2025

Department of Geological Engineering, Firat University, Elazığ 23119, Türkiye.

Zeynep Bala Duranay Yasemin Aslan Topçuoğlu Zülfü Gürocak

Background: In this study, the unconfined compressive strength (q) of a mixture consisting of clay reinforced with 24 mm-long basalt fiber was estimated using extreme learning machine (ELM). The aim of this study is to estimate the results closest to the data obtained through experimental studies without the need for experimental studies. The literature review reveals that the ELM technique has not been applied to predict the compressive strength of basalt fiber-reinforced clay, and this study aims to provide a novel contribution in this area.

View Article and Find Full Text PDF

Similar Publications

Semantic segmentation using synthetic images of underwater marine-growth.

Front Robot AI

January 2025

AAU Energy, Aalborg University, Esbjerg, Denmark.

Christian Mai Jesper Liniger Simon Pedersen

Introduction: Subsea applications recently received increasing attention due to the global expansion of offshore energy, seabed infrastructure, and maritime activities; complex inspection, maintenance, and repair tasks in this domain are regularly solved with pilot-controlled, tethered remote-operated vehicles to reduce the use of human divers. However, collecting and precisely labeling submerged data is challenging due to uncontrollable and harsh environmental factors. As an alternative, synthetic environments offer cost-effective, controlled alternatives to real-world operations, with access to detailed ground-truth data.

View Article and Find Full Text PDF

Similar Publications

Does amplitude compression help or hinder attentional neural speech tracking?

J Neurosci

January 2025

Department of Psychology, University of Lübeck, Lübeck, Germany.

Martin Orf Ronny Hannemann Jonas Obleser

Amplitude compression is an indispensable feature of contemporary audio production and especially relevant in modern hearing aids. The cortical fate of amplitude-compressed speech signals is not well-studied, however, and may yield undesired side effects: We hypothesize that compressing the amplitude envelope of continuous speech reduces neural tracking. Yet, leveraging such a 'compression side effect' on unwanted, distracting sounds could potentially support attentive listening if effectively reducing their neural tracking.

View Article and Find Full Text PDF

Similar Publications

PEDRA-EFB0: colorectal cancer prognostication using deep learning with patch embeddings and dual residual attention.

Med Biol Eng Comput

January 2025

Radiol Dept, Jiangnan Univ, Affiliated Hosp, Wuxi, 214122, Jiangsu, People's Republic of China.

Zihao Zhao Hao Wang Dinghui Wu Qibing Zhu Xueping Tan

In computer-aided diagnosis systems, precise feature extraction from CT scans of colorectal cancer using deep learning is essential for effective prognosis. However, existing convolutional neural networks struggle to capture long-range dependencies and contextual information, resulting in incomplete CT feature extraction. To address this, the PEDRA-EFB0 architecture integrates patch embeddings and a dual residual attention mechanism for enhanced feature extraction and survival prediction in colorectal cancer CT scans.

View Article and Find Full Text PDF

Similar Publications

Optimal sparsity in autoencoder memory models of the hippocampus.

bioRxiv

January 2025

Center for Theoretical Neuroscience, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY.

Abhishek Shah René Hen Attila Losonczy Stefano Fusi

Storing complex correlated memories is significantly more efficient when memories are recoded to obtain compressed representations. Previous work has shown that compression can be implemented in a simple neural circuit, which can be described as a sparse autoencoder. The activity of the encoding units in these models recapitulates the activity of hippocampal neurons recorded in multiple experiments.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!