For virus classification and tracing, one idea is to generate minimal models from the gene sequences of each virus group for comparative analysis within and between classes, as well as classification and tracing of new sequences. The starting point of defining a minimal model for a group of gene sequences is to find their longest common sequence (LCS), but this is a non-deterministic polynomial-time hard (NP-hard) problem. Therefore, we applied some heuristic approaches of finding LCS, as well as some of the newer methods of treating gene sequences, including multiple sequence alignment (MSA) and k-mer natural vector (NV) encoding. To evaluate our algorithms, a five-fold cross validation classification scheme on a dataset of H1N1 virus non-structural protein 1 (NS1) gene was analyzed. The results indicate that the MSA-based algorithm has the best performance measured by classification accuracy, while the NV-based algorithm exhibits advantages in the time complexity of generating minimal models.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9858667PMC
http://dx.doi.org/10.3390/genes14010186DOI Listing

Publication Analysis

Top Keywords

gene sequences
16
minimal models
12
generating minimal
8
ns1 gene
8
classification tracing
8
gene
5
sequences
5
models h1n1
4
h1n1 ns1
4
sequences alignment-based
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!