Alignment-Free Sequence Comparison With Multiple k Values.

IEEE/ACM Trans Comput Biol Bioinform

Published: January 2022

Alignment-free sequence comparison approaches have become increasingly popular in computational biology, because alignment-based approaches are inefficient to process large-scale datasets. Still, there is no way to determine the optimal value of the critical parameter k for alignment-free approaches in general. In this article, we tried to solve the problem by involving multiple k values simultaneously. The method counts the occurrence of each k-mer with different k values in a sequence. Two weighting schemes, based on maximizing deviation method and genetic algorithm, are then used on these counts. We applied the method to enhance the three common alignment-free approaches D, D, and D, and evaluated its performance on similarity search and functionally related regulatory sequences recognition. The enhanced approaches achieve better performance than the original approaches in all cases, and much better performance than some other common measures, such as Pcc, Eu, Ma, Ch, Kld, and Cos.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TCBB.2019.2955081	DOI Listing

Publication Analysis

Top Keywords

alignment-free sequence

sequence comparison

multiple values

alignment-free approaches

better performance

approaches

alignment-free

comparison multiple

values alignment-free

comparison approaches

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!