Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533828 | PMC |
http://dx.doi.org/10.1038/s41467-023-41729-1 | DOI Listing |
Mol Genet Genomic Med
January 2025
Diagnostics and Therapeutics of Intractable Diseases, Intractable Disease Research Center, Graduate School of Medicine, Juntendo University, Tokyo, Japan.
Background: Sengers syndrome is an autosomal recessive mitochondrial DNA depletion syndrome characterized by hypertrophic cardiomyopathy, congenital cataracts, skeletal myopathy, exercise intolerance, and lactic acidosis. Dysfunction of acylglycerol kinase (AGK) is responsible for the disease, and several AGK gene variants have been reported.
Methods: We employed a comprehensive genomic analysis approach, including whole-genome sequencing and RNA sequencing, combined with various bioinformatics tools.
Nucleic Acids Res
January 2025
London Institute for Mathematical Sciences Royal Institution, 21 Albemarle St, London W1S 4BS, UK.
Recent advancements in genomics, propelled by artificial intelligence, have unlocked unprecedented capabilities in interpreting genomic sequences, mitigating the need for exhaustive experimental analysis of complex, intertwined molecular processes inherent in DNA function. A significant challenge, however, resides in accurately decoding genomic sequences, which inherently involves comprehending rich contextual information dispersed across thousands of nucleotides. To address this need, we introduce GENA language model (GENA-LM), a suite of transformer-based foundational DNA language models capable of handling input lengths up to 36 000 base pairs.
View Article and Find Full Text PDFCurr Med Chem
January 2025
Laboratory of Angiopathology Institute of General Pathology and Pathophysiology, 8, Baltiiskaya Street, 125315, Moscow, Russia.
This review discusses the possibility of inheritance of some diseases through mutations in mitochondrial DNA. These are examples of many mitochondrial diseases that can be caused by mutations in mitochondrial DNA. Symptoms and severity can vary widely depending on the specific mutation and affected tissues.
View Article and Find Full Text PDFFront Parasitol
March 2024
Center for Research in Infectious Diseases, College of Graduate Studies and Research, Mount Kenya University, Thika, Kenya.
Introduction: Schistosomiasis (Bilharzia), a neglected tropical disease caused by parasites, afflicts over 240 million people globally, disproportionately impacting Sub-Saharan Africa. Current diagnostic tests, despite their utility, suffer from limitations like low sensitivity. Polymerase chain reaction (PCR) and quantitative real-time PCR (qPCR) remain the most common and sensitive nucleic acid amplification tests.
View Article and Find Full Text PDFF1000Res
January 2025
Charles Darwin University Research Institute for the Environment and Livelihoods, Casuarina, Northern Territory, 0909, Australia.
The eastern or Tasmanian bettong ( ) is one of four extant bettong species and is listed as 'Near Threatened' by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!