Computing Maximal Covers for Protein Sequences.

J Comput Biol

Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada.

Published: February 2023

A of a or of length , which we model as an array , is a repeating substring of such that "many" positions in lie within occurrences of . A -introduced in 2018 by Mhaskar and Smyth as -is a partial cover that, over all partial covers , maximizes the positions covered. Applying data structures also introduced by Mhaskar and Smyth, our software MAXCOVER for the first time enables efficient computation of for any -in particular, as described here, for protein sequences of Arabidopsis, , and humans. In this protein context, we also compare an extended version of MAXCOVER with existing software (MUMmer's repeat-match) for the closely related task of computing non-extendible repeating substrings (a.k.a. ). In practice, MAXCOVER is an order-of-magnitude faster than MUMmer, with much lower space requirements, while producing more compact output that, nevertheless, yields a more exact and user-friendly specification of the repeats.

Download full-text PDF

Source
http://dx.doi.org/10.1089/cmb.2021.0520DOI Listing

Publication Analysis

Top Keywords

protein sequences
8
mhaskar smyth
8
computing maximal
4
maximal covers
4
covers protein
4
sequences length
4
length model
4
model array
4
array repeating
4
repeating substring
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!