Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning.

J Cheminform

MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China.

Published: November 2024

Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.529. We also compiled the UniProtSMB dataset, which merges sites from similar proteins based on raw data from UniProtKB database, and achieved an MCC of 0.699 on the test set. In addition, CLAPE-SMB achieved an MCC of 0.815 on our intrinsically disordered protein (IDP) dataset that contains 336 non-redundant sequences. Case studies of DAPK1, RebH, and Nep1 support the potential of this binding site prediction tool to aid in drug design. The code and datasets are freely available at https://github.com/JueWangTHU/CLAPE-SMB . SCIENTIFIC CONTRIBUTION: CLAPE-SMB combines a pre-trained protein language model with contrastive learning to accurately predict protein-small molecule binding sites, especially for proteins without experimental structures, such as IDPs. Trained across various datasets, this model shows strong adaptability, making it a valuable tool for advancing drug design and understanding protein-small molecule interactions.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542454PMC
http://dx.doi.org/10.1186/s13321-024-00920-2DOI Listing

Publication Analysis

Top Keywords

protein-small molecule
16
molecule binding
16
pre-trained protein
12
protein language
12
language model
12
model contrastive
12
contrastive learning
12
binding sites
12
drug design
12
achieved mcc
12

Similar Publications

Qingke Pingchuan granules alleviate airway inflammation in COPD exacerbation by inhibiting neutrophil extracellular traps in mice.

Phytomedicine

November 2024

Division of Pulmonary Diseases, State Key Laboratory of Biotherapy, and Department of Respiratory and Critical Care Medicine, West China Hospital, West China School of Medicine, Sichuan University, Chengdu, China. Electronic address:

Background: Chronic obstructive pulmonary disease (COPD) imposes a significant global health and socioeconomic burden. Exacerbations of COPD (ECOPD), characterized by heightened airway inflammation and mucus hypersecretion, adversely affect patient health and accelerate disease progression. Qingke Pingchuan (QKPC) granules, a formulation from Traditional Chinese Medicine initially prescribed for acute bronchitis, have shown unexplored potential in ECOPD management, with mechanisms of action yet to be clarified.

View Article and Find Full Text PDF

Intrinsically disordered protein regions (IDRs) are well established as contributors to intermolecular interactions and the formation of biomolecular condensates. In particular, RNA-binding proteins (RBPs) often harbor IDRs in addition to folded RNA-binding domains that contribute to RBP function. To understand the dynamic interactions of an IDR-RNA complex, we characterized the RNA-binding features of a small (68 residues), positively charged IDR-containing protein, Small ERDK-Rich Factor (SERF).

View Article and Find Full Text PDF

Molecular docking stands as a pivotal element in the realm of computer-aided drug design (CADD), consistently contributing to advancements in pharmaceutical research. In essence, it employs computer algorithms to identify the "best" match between two molecules, akin to solving intricate three-dimensional jigsaw puzzles. At a more stringent level, the molecular docking challenge entails predicting the accurate bound association state based on the atomic coordinates of two molecules.

View Article and Find Full Text PDF

BindingDB (bindingdb.org) is a public, web-accessible database of experimentally measured binding affinities between small molecules and proteins, which supports diverse applications including medicinal chemistry, biochemical pathway annotation, training of artificial intelligence models and computational chemistry methods development. This update reports significant growth and enhancements since our last review in 2016.

View Article and Find Full Text PDF

Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning.

J Cheminform

November 2024

MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China.

Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!