A random forest classifier for protein-protein docking models.

Bioinform Adv

Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia.

Published: December 2021

Unlabelled: Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated docking models for each of the 230 complexes in the protein-protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions.

Supplementary Information: Supplementary data are available at online.

Software And Data Availability Statement: The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9710594PMC
http://dx.doi.org/10.1093/bioadv/vbab042DOI Listing

Publication Analysis

Top Keywords

random forest
20
docking models
20
machine learning
12
forest classifier
8
docking
8
random
5
models
5
classifier protein-protein
4
protein-protein docking
4
models unlabelled
4

Similar Publications

Introduction And Hypothesis: This study aims to develop a postpartum stress urinary incontinence (PPSUI) risk prediction model based on an updated definition of PPSUI, using machine learning algorithms. The goal is to identify the best model for early clinical screening to improve screening accuracy and optimize clinical management strategies.

Methods: This prospective study collected data from 1208 postpartum women, with the dataset randomly divided into training and testing sets (8:2).

View Article and Find Full Text PDF

Objective: To develop and validate an explainable machine learning (ML) model predicting the risk of hemorrhagic transformation (HT) after intravenous thrombolysis.

Methods: We retrospectively enrolled patients who received intravenous tissue plasminogen activator (IV-tPA) thrombolysis within 4.5 h after symptom onset to form the original modeling cohort.

View Article and Find Full Text PDF

Background And Purpose: The characteristics and role of NOD-like receptor (NLR) signaling pathway in high-grade gliomas were still unclear. This study aimed to reveal the association of NLR with clinical heterogeneity of glioblastoma (GBM) patients, and to explore the role of NLR pathway hub genes in the occurrence and development of GBM.

Methods: Transcriptomic data from 496 GBM patients with complete prognostic information were obtained from the TCGA, GEO, and CGGA databases.

View Article and Find Full Text PDF

Radiomics-based Machine Learning Approach to Predict Chemotherapy Responses in Colorectal Liver Metastases.

J Anus Rectum Colon

January 2025

Department of Gastroenterological Surgery, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, Japan.

Objectives: This study explored the clinical utility of CT radiomics-driven machine learning as a predictive marker for chemotherapy response in colorectal liver metastasis (CRLM) patients.

Methods: We included 150 CRLM patients who underwent first-line doublet chemotherapy, dividing them into a training cohort (n=112) and a test cohort (n=38). We manually delineated three-dimensional tumor volumes, selecting the largest liver metastasis for measurement, using pretreatment portal-phase CT images and extracted 107 radiomics features.

View Article and Find Full Text PDF

Background: We previously reported that machine learning could be used to predict conversion to psychosis in individuals at clinical high risk (CHR) for psychosis with up to 90% accuracy using the North American Prodrome Longitudinal Study-3 (NAPLS-3) dataset. A definitive test of our predictive model that was trained on the NAPLS-3 data, however, requires further support through implementation in an independent dataset. In this report we tested for model generalization using the previous iteration of NAPLS-3, the NAPLS-2, using the identical machine learning algorithms employed in our previous study.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!