Motivation: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long-read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required.

Results: Here, we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique.

Availability And Implementation: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10307940PMC
http://dx.doi.org/10.1093/bioinformatics/btad388DOI Listing

Publication Analysis

Top Keywords

raw nanopore
12
nanopore signals
8
tandem repeats
8
nanopore
5
strs
5
warpstr determining
4
determining tandem
4
tandem repeat
4
repeat lengths
4
lengths raw
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!