A of a or of length , which we model as an array , is a repeating substring of such that "many" positions in lie within occurrences of . A -introduced in 2018 by Mhaskar and Smyth as -is a partial cover that, over all partial covers , maximizes the positions covered. Applying data structures also introduced by Mhaskar and Smyth, our software MAXCOVER for the first time enables efficient computation of for any -in particular, as described here, for protein sequences of Arabidopsis, , and humans. In this protein context, we also compare an extended version of MAXCOVER with existing software (MUMmer's repeat-match) for the closely related task of computing non-extendible repeating substrings (a.k.a. ). In practice, MAXCOVER is an order-of-magnitude faster than MUMmer, with much lower space requirements, while producing more compact output that, nevertheless, yields a more exact and user-friendly specification of the repeats.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1089/cmb.2021.0520 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!