Motivation: Vast majority of human genetic disorders are associated with mutations that affect protein-protein interactions by altering wild-type binding affinity. Therefore, it is extremely important to assess the effect of mutations on protein-protein binding free energy to assist the development of therapeutic solutions. Currently, the most popular approaches use structural information to deliver the predictions, which precludes them to be applicable on genome-scale investigations. Indeed, with the progress of genomic sequencing, researchers are frequently dealing with assessing effect of mutations for which there is no structure available.

Results: Here, we report a Gradient Boosting Decision Tree machine learning algorithm, the SAAMBE-SEQ, which is completely sequence-based and does not require structural information at all. SAAMBE-SEQ utilizes 80 features representing evolutionary information, sequence-based features and change of physical properties upon mutation at the mutation site. The approach is shown to achieve Pearson correlation coefficient (PCC) of 0.83 in 5-fold cross validation in a benchmarking test against experimentally determined binding free energy change (ΔΔG). Further, a blind test (no-STRUC) is compiled collecting experimental ΔΔG upon mutation for protein complexes for which structure is not available and used to benchmark SAAMBE-SEQ resulting in PCC in the range of 0.37-0.46. The accuracy of SAAMBE-SEQ method is found to be either better or comparable to most advanced structure-based methods. SAAMBE-SEQ is very fast, available as webserver and stand-alone code, and indeed utilizes only sequence information, and thus it is applicable for genome-scale investigations to study the effect of mutations on protein-protein interactions.

Availability And Implementation: SAAMBE-SEQ is available at http://compbio.clemson.edu/saambe_webserver/indexSEQ.php#started.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8128451PMC
http://dx.doi.org/10.1093/bioinformatics/btaa761DOI Listing

Publication Analysis

Top Keywords

protein-protein binding
8
binding affinity
8
mutations protein-protein
8
binding free
8
free energy
8
applicable genome-scale
8
genome-scale investigations
8
saambe-seq
7
saambe-seq sequence-based
4
sequence-based method
4

Similar Publications

GEMIN5 and neurodevelopmental diseases: from functional insights to disease perception.

Neural Regen Res

January 2025

Genome Dynamics and Function, Centro de Biología Molecular Severo Ochoa, CSIC-UAM, Madrid, Spain.

GEMIN5 is a predominantly cytoplasmic multifunctional protein, known to be involved in recognizing snRNAs through its WD40 repeats domain placed at the N-terminus. A dimerization domain in the middle region acts as a hub for protein-protein interaction, while a non-canonical RNA-binding site is placed towards the C-terminus. The singular organization of structural domains present in GEMIN5 enables this protein to perform multiple functions through its ability to interact with distinct partners, both RNAs and proteins.

View Article and Find Full Text PDF

Objectives: To explore the mechanism by which (PSD) inhibits invasion and metastasis of triple-negative breast cancer (TNBC).

Methods: The public databases were used to identify the potential targets of PSD and the invasion and metastasis targets of TNBC to obtain the intersection targets between PSD and TNBC. The "PSD-target-disease" interaction network was constructed and protein-protein interaction (PPI) analysis was performed to obtain the core targets, which were analyzed for KEGG pathway and GO functional enrichment.

View Article and Find Full Text PDF

Introduction: The COVID-19 pandemic has necessitated rapid advancements in therapeutic discovery. This study presents an integrated approach combining machine learning (ML) and network pharmacology to identify potential non-covalent inhibitors against pivotal proteins in COVID-19 pathogenesis, specifically B-cell lymphoma 2 (BCL2) and Epidermal Growth Factor Receptor (EGFR).

Method: Employing a dataset of 13,107 compounds, ML algorithms such as k-Nearest Neighbors (kNN), Support Vector Machine (SVM), Random Forest (RF), and Naïve Bayes (NB) were utilized for screening and predicting active inhibitors based on molecular features.

View Article and Find Full Text PDF

Decoding the blueprint of receptor binding by filoviruses through large-scale binding assays and machine learning.

Cell Host Microbe

January 2025

Department of Pathology, Microbiology, and Immunology, School of Veterinary Medicine, University of California, Davis, CA 95616, USA. Electronic address:

Evidence suggests that bats are important hosts of filoviruses, yet the specific species involved remain largely unidentified. Niemann-Pick C1 (NPC1) is an essential entry receptor, with amino acid variations influencing viral susceptibility and species-specific tropism. Herein, we conducted combinatorial binding studies with seven filovirus glycoproteins (GPs) and NPC1 orthologs from 81 bat species.

View Article and Find Full Text PDF

Protein binding and folding through an evolutionary lens.

Curr Opin Struct Biol

January 2025

Department of Medical Biochemistry and Microbiology, Uppsala University, BMC, Box 582, SE-75123 Uppsala, Sweden. Electronic address:

Protein-protein associations are often mediated by an intrinsically disordered protein region interacting with a folded domain in a coupled binding and folding reaction. Classic physical organic chemistry approaches together with structural biology have shed light on mechanistic aspects of such reactions. Further insight into general principles may be obtained by interpreting the results through an evolutionary lens.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!