Effective DNA binding protein prediction by using key features via Chou's general PseAAC.

J Theor Biol

Department of Computer Science and Engineering, United International University, Plot 2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh. Electronic address:

Published: January 2019

DNA-binding proteins (DBPs) are responsible for several cellular functions, starting from our immunity system to the transport of oxygen. In the recent studies, scientists have used supervised machine learning based methods that use information from the protein sequence only to classify the DBPs. Most of the methods work effectively on the train sets but performance of most of them degrades in the independent test set. It shows a room for improving the prediction method by reducing over-fitting. In this paper, we have extracted several features solely using the protein sequence and carried out two different types of feature selection on them. Our results have proven comparable on training set and significantly improved on the independent test set. On the independent test set our accuracy was 82.26% which is 1.62% improved compared to the previous best state-of-the-art methods. Performance in terms of sensitivity and area under receiver operating characteristic curve for the independent test set was also higher and they were 0.95 and 0.823 respectively.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.jtbi.2018.10.027DOI Listing

Publication Analysis

Top Keywords

independent test
16
test set
16
protein sequence
8
set
5
effective dna
4
dna binding
4
binding protein
4
protein prediction
4
prediction key
4
key features
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!