PIPENN: protein interface prediction from sequence with an ensemble of neural nets.

Bas Stringer Hans de Ferrante Sanne Abeln Jaap Heringa K Anton Feenstra Reza Haydarlou

Bioinformatics

Department of Computer Science, IBIVU-Center for Integrative Bioinformatics, Vrije Universiteit, 1081HV Amsterdam, The Netherlands.

Published: April 2022

Motivation: The interactions between proteins and other molecules are essential to many biological and cellular processes. Experimental identification of interface residues is a time-consuming, costly and challenging task, while protein sequence data are ubiquitous. Consequently, many computational and machine learning approaches have been developed over the years to predict such interface residues from sequence. However, the effectiveness of different Deep Learning (DL) architectures and learning strategies for protein-protein, protein-nucleotide and protein-small molecule interface prediction has not yet been investigated in great detail. Therefore, we here explore the prediction of protein interface residues using six DL architectures and various learning strategies with sequence-derived input features.

Results: We constructed a large dataset dubbed BioDL, comprising protein-protein interactions from the PDB, and DNA/RNA and small molecule interactions from the BioLip database. We also constructed six DL architectures, and evaluated them on the BioDL benchmarks. This shows that no single architecture performs best on all instances. An ensemble architecture, which combines all six architectures, does consistently achieve peak prediction accuracy. We confirmed these results on the published benchmark set by Zhang and Kurgan (ZK448), and on our own existing curated homo- and heteromeric protein interaction dataset. Our PIPENN sequence-based ensemble predictor outperforms current state-of-the-art sequence-based protein interface predictors on ZK448 on all interaction types, achieving an AUC-ROC of 0.718 for protein-protein, 0.823 for protein-nucleotide and 0.842 for protein-small molecule.

Availability And Implementation: Source code and datasets are available at https://github.com/ibivu/pipenn/.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9004643	PMC
http://dx.doi.org/10.1093/bioinformatics/btac071	DOI Listing

Publication Analysis

Top Keywords

protein interface

interface residues

interface prediction

architectures learning

learning strategies

interface

pipenn protein

prediction

prediction sequence

sequence ensemble

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered