Machine learning techniques for sequence-based prediction of viral-host interactions between SARS-CoV-2 and human proteins.

Biomed J

Department of Computer Science & Engineering, Heritage Institute of Technology, Kolkata, India; Department of Information Technology, Techno Main, Saltlake, Kolkata, India; Department of. Computer Science & Engineering, University of Kalyani, Kalyani, India. Electronic address:

Published: October 2020

Background: COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein-protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified.

Method: In this article, various machine learning models are built to predict the PPIs between the virus and human proteins that are further validated using biological experiments. The classification models are prepared based on different sequence-based features of human proteins like amino acid composition, pseudo amino acid composition, and conjoint triad.

Result: We have built an ensemble voting classifier using SVM, SVM, and Random Forest technique that gives a greater accuracy, precision, specificity, recall, and F1 score compared to all other models used in the work. A total of 1326 potential human target proteins of SARS-CoV-2 have been predicted by the proposed ensemble model and validated using gene ontology and KEGG pathway enrichment analysis. Several repurposable drugs targeting the predicted interactions are also reported.

Conclusion: This study may encourage the identification of potential targets for more effective anti-COVID drug discovery.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7470713PMC
http://dx.doi.org/10.1016/j.bj.2020.08.003DOI Listing

Publication Analysis

Top Keywords

human proteins
16
machine learning
8
sars-cov-2 virus
8
amino acid
8
acid composition
8
human
6
proteins
6
sars-cov-2
5
learning techniques
4
techniques sequence-based
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!