Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11146637PMC
http://dx.doi.org/10.1109/dcc55655.2023.00023DOI Listing

Publication Analysis

Top Keywords

matching statistics
12
computing matching
8
compressed suffix
8
suffix tree
8
lcp array
8
statistics wheeler
4
wheeler dfas
4
dfas matching
4
statistics introduced
4
introduced solve
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!