DNA storage is one of the most promising ways for future information storage due to its high data storage density, durable storage time and low maintenance cost. However, errors are inevitable during synthesizing, storing and sequencing. Currently, many error correction algorithms have been developed to ensure accurate information retrieval, but they will decrease storage density or increase computing complexity. Here, we apply the Bloom Filter, a space-efficient probabilistic data structure, to DNA storage to achieve the anti-error, or anti-contamination function. This method only needs the original correct DNA sequences (referred to as target sequences) to produce a corresponding data structure, which will filter out almost all the incorrect sequences (referred to as non-target sequences) during sequencing data analysis. Experimental results demonstrate the universal and efficient filtering capabilities of our method. Furthermore, we employ the Counting Bloom Filter to achieve the file version control function, which significantly reduces synthesis costs when modifying DNA-form files. To achieve cost-efficient file version control function, a modified system based on yin-yang codec is developed.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10981766PMC
http://dx.doi.org/10.1093/bib/bbae125DOI Listing

Publication Analysis

Top Keywords

bloom filter
12
file version
12
version control
12
data storage
8
dna storage
8
storage density
8
data structure
8
sequences referred
8
control function
8
storage
7

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!