A digital twin for DNA data storage based on comprehensive quantification of errors and biases.

Andreas L Gimpel Wendelin J Stark Reinhard Heckel Robert N Grass

Nat Commun

Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.

Published: September 2023

Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533828	PMC
http://dx.doi.org/10.1038/s41467-023-41729-1	DOI Listing

Publication Analysis

Top Keywords

dna data

data storage

errors biases

digital twin

twin dna

data-driven ecc

ecc development

data

storage

dna

Similar Publications

Successful Diagnosis of Sengers Syndrome Using a Comprehensive Genomic Analysis.

Mol Genet Genomic Med

January 2025

Diagnostics and Therapeutics of Intractable Diseases, Intractable Disease Research Center, Graduate School of Medicine, Juntendo University, Tokyo, Japan.

Kohta Nakamura Yukiko Yatsuka Sachie Naito Akira Hasegawa Takeya Kasukawa

Background: Sengers syndrome is an autosomal recessive mitochondrial DNA depletion syndrome characterized by hypertrophic cardiomyopathy, congenital cataracts, skeletal myopathy, exercise intolerance, and lactic acidosis. Dysfunction of acylglycerol kinase (AGK) is responsible for the disease, and several AGK gene variants have been reported.

Methods: We employed a comprehensive genomic analysis approach, including whole-genome sequencing and RNA sequencing, combined with various bioinformatics tools.

View Article and Find Full Text PDF

Similar Publications

GENA-LM: a family of open-source foundational DNA language models for long sequences.

Nucleic Acids Res

January 2025

London Institute for Mathematical Sciences Royal Institution, 21 Albemarle St, London W1S 4BS, UK.

Veniamin Fishman Yuri Kuratov Aleksei Shmelev Maxim Petrov Dmitry Penzar

Recent advancements in genomics, propelled by artificial intelligence, have unlocked unprecedented capabilities in interpreting genomic sequences, mitigating the need for exhaustive experimental analysis of complex, intertwined molecular processes inherent in DNA function. A significant challenge, however, resides in accurately decoding genomic sequences, which inherently involves comprehending rich contextual information dispersed across thousands of nucleotides. To address this need, we introduce GENA language model (GENA-LM), a suite of transformer-based foundational DNA language models capable of handling input lengths up to 36 000 base pairs.

View Article and Find Full Text PDF

Similar Publications

Mitochondrial DNA Mutations as a Factor in the Heritability of Atherosclerosis and Other Diseases.

Curr Med Chem

January 2025

Laboratory of Angiopathology Institute of General Pathology and Pathophysiology, 8, Baltiiskaya Street, 125315, Moscow, Russia.

Alexander N Orekhov Nikolay A Orekhov Vasily N Sukhorukov Victoria A Khotina Tatiana I Kovianova

This review discusses the possibility of inheritance of some diseases through mutations in mitochondrial DNA. These are examples of many mitochondrial diseases that can be caused by mutations in mitochondrial DNA. Symptoms and severity can vary widely depending on the specific mutation and affected tissues.

View Article and Find Full Text PDF

Similar Publications

Development of a rapid and highly sensitive nucleic acid-based diagnostic test for schistosomes, leveraging on identical multi-repeat sequences.

Front Parasitol

March 2024

Center for Research in Infectious Diseases, College of Graduate Studies and Research, Mount Kenya University, Thika, Kenya.

Ombeni Ally Bernard N Kanoi Shwetha Kamath Clement Shiluli Eric M Ndombi

Introduction: Schistosomiasis (Bilharzia), a neglected tropical disease caused by parasites, afflicts over 240 million people globally, disproportionately impacting Sub-Saharan Africa. Current diagnostic tests, despite their utility, suffer from limitations like low sensitivity. Polymerase chain reaction (PCR) and quantitative real-time PCR (qPCR) remain the most common and sensitive nucleic acid amplification tests.

View Article and Find Full Text PDF

Similar Publications

A reference genome for the eastern bettong (Bettongia gaimardi).

F1000Res

January 2025

Charles Darwin University Research Institute for the Environment and Livelihoods, Casuarina, Northern Territory, 0909, Australia.

Luke W Silver Richard J Edwards Linda Neaves Adrian Manning Carolyn J Hogg

The eastern or Tasmanian bettong ( ) is one of four extant bettong species and is listed as 'Near Threatened' by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!