Seeing the trees through the forest: sequence-based homo- and heteromeric protein-protein interaction sites prediction using random forest.

Qingzhen Hou Paul F G De Geest Wim F Vranken Jaap Heringa K Anton Feenstra

Bioinformatics

Center for Integrative Bioinformatics VU (IBIVU), Amsterdam, HV, The Netherlands.

Published: May 2017

Motivation: Genome sequencing is producing an ever-increasing amount of associated protein sequences. Few of these sequences have experimentally validated annotations, however, and computational predictions are becoming increasingly successful in producing such annotations. One key challenge remains the prediction of the amino acids in a given protein sequence that are involved in protein-protein interactions. Such predictions are typically based on machine learning methods that take advantage of the properties and sequence positions of amino acids that are known to be involved in interaction. In this paper, we evaluate the importance of various features using Random Forest (RF), and include as a novel feature backbone flexibility predicted from sequences to further optimise protein interface prediction.

Results: We observe that there is no single sequence feature that enables pinpointing interacting sites in our Random Forest models. However, combining different properties does increase the performance of interface prediction. Our homomeric-trained RF interface predictor is able to distinguish interface from non-interface residues with an area under the ROC curve of 0.72 in a homomeric test-set. The heteromeric-trained RF interface predictor performs better than existing predictors on a independent heteromeric test-set. We trained a more general predictor on the combined homomeric and heteromeric dataset, and show that in addition to predicting homomeric interfaces, it is also able to pinpoint interface residues in heterodimers. This suggests that our random forest model and the features included capture common properties of both homodimer and heterodimer interfaces.

Availability And Implementation: The predictors and test datasets used in our analyses are freely available ( http://www.ibi.vu.nl/downloads/RF_PPI/ ).

Contact: k.a.feenstra@vu.nl.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF	Source
http://dx.doi.org/10.1093/bioinformatics/btx005	DOI Listing

Publication Analysis

Top Keywords

random forest

amino acids

interface predictor

interface

trees forest

forest sequence-based

sequence-based homo-

homo- heteromeric

heteromeric protein-protein

protein-protein interaction

Similar Publications

Machine learning-based prediction of illness course in major depression: The relevance of risk factors.

J Affect Disord

January 2025

Department of Psychiatry and Psychotherapy, University of Marburg, Germany; Center for Mind, Brain and Behavior (CMBB), University of Marburg, Germany.

Lea Teutenberg Frederike Stein Florian Thomas-Odenthal Paula Usemann Katharina Brosch

Background: Major depressive disorder (MDD) comes along with an increased risk of recurrence and poor course of illness. Machine learning has recently shown promise in the prediction of mental illness, yet models aiming to predict MDD course are still rare and do not quantify the predictive value of established MDD recurrence risk factors.

Methods: We analyzed N = 571 MDD patients from the Marburg-Münster Affective Disorder Cohort Study (MACS).

View Article and Find Full Text PDF

Similar Publications

Integrating machine learning, suspect and nontarget screening reveal the interpretable fates of micropollutants and their transformation products in sludge.

J Hazard Mater

January 2025

School of Environmental Studies, China University of Geosciences, Wuhan, Hubei 430074, China; National Engineering Research Center of Industrial Wastewater Detoxication and Resource Recovery, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China. Electronic address:

Siying Cai Xinyu Zhang Tong Sun Hao Zhou Yu Zhang

Activated sludge enriches vast amounts of micropollutants (MPs) when wastewater is treated, posing potential environmental risks. While standard methods typically focus on target analysis of known compounds, the identity, structure, and concentration of transformation products (TPs) of MPs remain less understood. Here, we employed a novel approach that integrates machine learning for the quantification of nontarget TPs with advanced target, suspect, and nontarget screening strategies.

View Article and Find Full Text PDF

Similar Publications

Comparison of data augmentation and classification algorithms based on plastic spectroscopy.

Anal Methods

January 2025

Jiangsu Beier Machinery Co. Ltd, Jiangsu, 215600, China.

Jiachao Luo Qunbiao Wu Jin Cao Haifeng Fang Chenyang Xu

Plastic waste management is one of the key issues in global environmental protection. Integrating spectroscopy acquisition devices with deep learning algorithms has emerged as an effective method for rapid plastic classification. However, the challenges in collecting plastic samples and spectroscopy data have resulted in a limited number of data samples and an incomplete comparison of relevant classification algorithms.

View Article and Find Full Text PDF

Similar Publications

Machine Learning Models For Preventative Mobile Health Asthma Control.

J Asthma

January 2025

Alan Wong

IntroductionAsthma attacks are set off by triggers such as pollutants from the environment, respiratory viruses, physical activity and allergens. The aim of this research is to create a machine learning model using data from mobile health technology to predict and appropriately warn a patient to avoid such triggers.MethodsLightweight machine learning models, XGBoost, Random Forest, and LightGBM were trained and tested on cleaned asthma data with a 70-30 train-test split.

View Article and Find Full Text PDF

Similar Publications

Non-Invasive Cancer Detection Using Blood Test and Predictive Modeling Approach.

Adv Appl Bioinform Chem

January 2025

Department of Information Technology, Mutah University, Al-Karak, Jordan.

Ahmad S Tarawneh Ahmad K Al Omari Enas M Al-Khlifeh Fatimah S Tarawneh Mansoor Alghamdi

Purpose: The incidence of cancer, which is a serious public health concern, is increasing. A predictive analysis driven by machine learning was integrated with haematology parameters to create a method for the simultaneous diagnosis of several malignancies at different stages.

Patients And Methods: We analysed a newly collected dataset from various hospitals in Jordan comprising 19,537 laboratory reports (6,280 cancer and 13,257 noncancer cases).

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!