Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs.

Michaela Areti Zervou Effrosyni Doutsi Pavlos Pavlidis Panagiotis Tsakalides

Bioinformatics

Department of Computer Science, University of Crete, Heraklion 700 13, Greece.

Published: July 2021

Motivation: Protein structural class prediction is one of the most significant problems in bioinformatics, as it has a prominent role in understanding the function and evolution of proteins. Designing a computationally efficient but at the same time accurate prediction method remains a pressing issue, especially for sequences that we cannot obtain a sufficient amount of homologous information from existing protein sequence databases. Several studies demonstrate the potential of utilizing chaos game representation along with time series analysis tools such as recurrence quantification analysis, complex networks, horizontal visibility graphs (HVG) and others. However, the majority of existing works involve a large amount of features and they require an exhaustive, time consuming search of the optimal parameters. To address the aforementioned problems, this work adopts the generalized multidimensional recurrence quantification analysis (GmdRQA) as an efficient tool that enables to process concurrently a multidimensional time series and reduce the number of features. In addition, two data-driven algorithms, namely average mutual information and false nearest neighbors, are utilized to define in a fast yet precise manner the optimal GmdRQA parameters.

Results: The classification accuracy is improved by the combination of GmdRQA with the HVG. Experimental evaluation on a real benchmark dataset demonstrates that our methods achieve similar performance with the state-of-the-art but with a smaller computational cost.

Availability And Implementation: The code to reproduce all the results is available at https://github.com/aretiz/protein_structure_classification/tree/main.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btab407	DOI Listing

Publication Analysis

Top Keywords

recurrence quantification

quantification analysis

computationally efficient

horizontal visibility

visibility graphs

time series

structural classification

classification proteins

proteins based

based computationally

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered