Alignment-free sequence comparison approaches have become increasingly popular in computational biology, because alignment-based approaches are inefficient to process large-scale datasets. Still, there is no way to determine the optimal value of the critical parameter k for alignment-free approaches in general. In this article, we tried to solve the problem by involving multiple k values simultaneously. The method counts the occurrence of each k-mer with different k values in a sequence. Two weighting schemes, based on maximizing deviation method and genetic algorithm, are then used on these counts. We applied the method to enhance the three common alignment-free approaches D, D, and D, and evaluated its performance on similarity search and functionally related regulatory sequences recognition. The enhanced approaches achieve better performance than the original approaches in all cases, and much better performance than some other common measures, such as Pcc, Eu, Ma, Ch, Kld, and Cos.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TCBB.2019.2955081 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!