A deep learning ensemble for function prediction of hypothetical proteins from pathogenic bacterial species.

Comput Biol Chem

Department of Computer Science, Jamia Millia Islamia, Jamia Nagar, New Delhi, 110025, Delhi, India.

Published: December 2019

AI Article Synopsis

  • Protein function prediction is essential in the post-genomics landscape due to the critical roles proteins play in biological systems, but traditional methods are becoming inefficient due to the massive influx of sequencing data.
  • A shift towards computational approaches integrates various features like sequence data, protein interactions, and chemical properties to improve prediction accuracy, enabling the handling of large datasets more effectively.
  • This work utilizes a deep learning ensemble model trained on nearly 10,000 features from over 171,000 bacterial proteins to categorize functions into 1,739 GO terms, achieving a notable F1 measure of 0.7912, marking a significant advance in protein function prediction for bacterial organisms.

Article Abstract

Protein function prediction is a crucial task in the post-genomics era due to their diverse irreplaceable roles in a biological system. Traditional methods involved cost-intensive and time-consuming molecular biology techniques but they proved to be ineffective after the outburst of sequencing data through the advent of cost-effective and advanced sequencing techniques. To manage the pace of annotation with that of data generation, there is a shift to computational approaches which are based on homology, sequence and structure-based features, protein-protein interaction networks, phylogenetic profiles, and physicochemical properties, etc. A combination of these features has proven to be promising for protein function prediction in terms of improving prediction accuracy. In the present work, we have employed a combination of features based on sequence, physicochemical property, subsequence and annotation features with a total of 9890 features extracted and/or calculated for 171,212 reviewed prokaryotic proteins of 9 bacterial phyla from UniProtKB, to train a supervised deep learning ensemble model with the aim to categorize a bacterial hypothetical/unreviewed protein's function into 1739 GO terms as functional classes. The proposed system being fully dedicated to bacterial organisms is a novel attempt amongst various existing machine learning based protein function prediction systems based on mixed organisms. Experimental results demonstrate the success of the proposed deep learning ensemble model based on deep neural network method with F1 measure of 0.7912 on the prepared Test dataset 1 of reviewed proteins.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.compbiolchem.2019.107147DOI Listing

Publication Analysis

Top Keywords

function prediction
16
deep learning
12
learning ensemble
12
protein function
12
combination features
8
ensemble model
8
function
5
prediction
5
based
5
features
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!