Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches.

J Phys Chem Lett

School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India.

Published: December 2023

Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining -gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the -mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.

Download full-text PDF

Source
http://dx.doi.org/10.1021/acs.jpclett.3c02817DOI Listing

Publication Analysis

Top Keywords

machine learning
12
afps sequence
8
sequence similarity
8
learning models
8
afps
5
accurate prediction
4
prediction antifreeze
4
antifreeze protein
4
protein sequences
4
sequences natural
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!