Fast trimer statistics facilitate accurate decoding of large random DNA barcode sets even at large sequencing error rates.

PNAS Nexus

Department of Computer Science and Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA.

Published: November 2022

Predefined sets of short DNA sequences are commonly used as barcodes to identify individual biomolecules in pooled populations. Such use requires either sufficiently small DNA error rates, or else an error-correction methodology. Most existing DNA error-correcting codes (ECCs) correct only one or two errors per barcode in sets of typically ≲10 barcodes. We here consider the use of random barcodes of sufficient length that they remain accurately decodable even with ≳6 errors and even at [Formula: see text] or 20% nucleotide error rates. We show that length ∼34 nt is sufficient even with ≳10 barcodes. The obvious objection to this scheme is that it requires comparing every read to every possible barcode by a slow Levenshtein or Needleman-Wunsch comparison. We show that several orders of magnitude speedup can be achieved by (i) a fast triage method that compares only trimer (three consecutive nucleotide) occurence statistics, precomputed in linear time for both reads and barcodes, and (ii) the massive parallelism available on today's even commodity-grade Graphics Processing Units (GPUs). With 10 barcodes of length 34 and 10% DNA errors (substitutions and indels), we achieve in simulation 99.9% precision (decode accuracy) with 98.8% recall (read acceptance rate). Similarly high precision with somewhat smaller recall is achievable even with 20% DNA errors. The amortized computation cost on a commodity workstation with two GPUs (2022 capability and price) is estimated as between US$ 0.15 and US$ 0.60 per million decoded reads.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9802387PMC
http://dx.doi.org/10.1093/pnasnexus/pgac252DOI Listing

Publication Analysis

Top Keywords

error rates
12
barcode sets
8
dna errors
8
dna
6
barcodes
6
fast trimer
4
trimer statistics
4
statistics facilitate
4
facilitate accurate
4
accurate decoding
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!