PDP-CON: prediction of domain/linker residues in protein sequences using a consensus approach.

Piyali Chatterjee Subhadip Basu Julian Zubek Mahantapas Kundu Mita Nasipuri Dariusz Plewczynski

J Mol Model

Center of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland.

Published: April 2016

The study focuses on predicting domain and linker residues in protein sequences, crucial for understanding protein function and structure.
A novel machine-learning approach was developed using six classifiers, and the PDP-CON tool achieved high accuracy and F-measure scores while validated against CASP database proteins.
All related datasets and tools are accessible for noncommercial use at the provided website, facilitating further research in this area.

The prediction of domain/linker residues in protein sequences is a crucial task in the functional classification of proteins, homology-based protein structure prediction, and high-throughput structural genomics. In this work, a novel consensus-based machine-learning technique was applied for residue-level prediction of the domain/linker annotations in protein sequences using ordered/disordered regions along protein chains and a set of physicochemical properties. Six different classifiers-decision tree, Gaussian naïve Bayes, linear discriminant analysis, support vector machine, random forest, and multilayer perceptron-were exhaustively explored for the residue-level prediction of domain/linker regions. The protein sequences from the curated CATH database were used for training and cross-validation experiments. Test results obtained by applying the developed PDP-CON tool to the mutually exclusive, independent proteins of the CASP-8, CASP-9, and CASP-10 databases are reported. An n-star quality consensus approach was used to combine the results yielded by different classifiers. The average PDP-CON accuracy and F-measure values for the CASP targets were found to be 0.86 and 0.91, respectively. The dataset, source code, and all supplementary materials for this work are available at https://cmaterju.org/cmaterbioinfo/ for noncommercial use.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788683	PMC
http://dx.doi.org/10.1007/s00894-016-2933-0	DOI Listing

Publication Analysis

Top Keywords

prediction domain/linker

protein sequences

domain/linker residues

residues protein

consensus approach

residue-level prediction

regions protein

protein

pdp-con prediction

domain/linker

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!