Nanopore Decoding with Speed and Versatility for Data Storage.

Bioinformatics

Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina, USA.

Published: January 2025

Motivation: As nanopore technology reaches ever higher throughput and accuracy, it becomes an increasingly viable candidate for reading out DNA data storage. Nanopore sequencing offers considerable flexibility by allowing long reads, real-time signal analysis, and the ability to read both DNA and RNA. We need flexible and efficient designs that match nanopore's capabilities, but relatively few designs have been explored and many have significant inefficiency in read density, error rate, or compute time. To address these problems, we designed a new single-read per-strand decoder that achieves low byte error rates, offers high throughput, scales to long reads, and works well for both DNA and RNA molecules. We achieve these results through a novel soft decoding algorithm that can be effectively parallelized on a GPU. Our faster decoder allows us to study a wider range of system designs.

Results: We demonstrate our approach on HEDGES, a state-of-the-art DNA-constrained convolutional code. We implement one hard decoder that runs serially and two soft decoders that run on GPUs. Our evaluation for each decoder is applied to the same population of nanopore reads collected from a synthesized library of strands. These same strands are synthesized with a T7 promoter to enable RNA transcription and decoding. Our results show that the hard decoder has a byte error rate over 25%, while the prior state of the art soft decoder can achieve error rates of 2.25%. However, that design also suffers a low throughput of 183 seconds/read. Our new Alignment Matrix Trellis soft decoder improves throughput by 257x with the trade off of a higher byte error rate of 3.52% compared to the state-of-the-art. Furthermore, we use the faster speed of our algorithm to explore more design options. We show that read densities of 0.33 bits/base can be achieved, which is 4x larger than prior MSA-based decoders. We also compare RNA to DNA, and find that RNA has 85% as many error free reads as compared to DNA.

Availability And Implementation: Source code for our soft decoder and data used to generate figures is available publicly in the Github repository https://github.com/dna-storage/hedges-soft-decoder (10.5281/zenodo.11454877). All raw FAST5/FASTQ data is available at 10.5281/zenodo.11985454 and 10.5281/zenodo.12014515.

Supplementary Information: Supplementary data are available at Bioninformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaf006DOI Listing

Publication Analysis

Top Keywords

error rate
12
byte error
12
soft decoder
12
data storage
8
long reads
8
dna rna
8
decoder
8
error rates
8
hard decoder
8
error
6

Similar Publications

Drug Development.

Alzheimers Dement

December 2024

Anavex Life Sciences, New York, NY, USA.

Background: In AD trials, the treatment effect is typically evaluated by estimating the absolute difference in change from baseline to the end-of-study visit (e.g., 18 months) between treatment arms using the MMRM model.

View Article and Find Full Text PDF

Dementia Care Research and Psychosocial Factors.

Alzheimers Dement

December 2024

Dementia Research Centre, UCL Queen Square Institute of Neurology, University College London, London, United Kingdom.

Background: Responses to individualized music in people living with dementia can be indicated by both verbal and non-verbal cues. Evidence suggests that elevated pupil dilation responses to familiar vs. unfamiliar music are preserved in people living with typical Alzheimer's disease (tAD), and to an extent in people with its atypical 'visual' variant (Posterior Cortical Atrophy; PCA) (Brotherhood et al.

View Article and Find Full Text PDF

Technology and Dementia Preconference.

Alzheimers Dement

December 2024

Cumulus Neuroscience, Dublin, Ireland.

Background: Current tools for Alzheimer's disease screening and staging used in clinical research (e.g. ACE-3, ADAS-Cog) require substantial face-to-face time with trained professionals, and may be affected by subjectivity, "white coat syndrome" and other biases.

View Article and Find Full Text PDF

Primary progressive multiple sclerosis (PPMS) affects 10-15% of multiple sclerosis patients and presents significant variability in the rate of disability progression. Identifying key biological features and patients at higher risk for fast progression is crucial to develop and optimize treatment strategies. Peripheral blood cell transcriptome has the potential to provide valuable information to predict patients' outcomes.

View Article and Find Full Text PDF

Background: Compensatory errors are a conventional part of an articulation disorder identified by speech pathologists in patients with Cleft palate (CP). This study aimed to evaluate the effect of new mixed articulation therapy on the perceptual and acoustic features of these errors.

Methods: The single-case experimental design, ABA design, was used in this study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!