Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.529. We also compiled the UniProtSMB dataset, which merges sites from similar proteins based on raw data from UniProtKB database, and achieved an MCC of 0.699 on the test set. In addition, CLAPE-SMB achieved an MCC of 0.815 on our intrinsically disordered protein (IDP) dataset that contains 336 non-redundant sequences. Case studies of DAPK1, RebH, and Nep1 support the potential of this binding site prediction tool to aid in drug design. The code and datasets are freely available at https://github.com/JueWangTHU/CLAPE-SMB . SCIENTIFIC CONTRIBUTION: CLAPE-SMB combines a pre-trained protein language model with contrastive learning to accurately predict protein-small molecule binding sites, especially for proteins without experimental structures, such as IDPs. Trained across various datasets, this model shows strong adaptability, making it a valuable tool for advancing drug design and understanding protein-small molecule interactions.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542454 | PMC |
http://dx.doi.org/10.1186/s13321-024-00920-2 | DOI Listing |
Phytomedicine
November 2024
Division of Pulmonary Diseases, State Key Laboratory of Biotherapy, and Department of Respiratory and Critical Care Medicine, West China Hospital, West China School of Medicine, Sichuan University, Chengdu, China. Electronic address:
Background: Chronic obstructive pulmonary disease (COPD) imposes a significant global health and socioeconomic burden. Exacerbations of COPD (ECOPD), characterized by heightened airway inflammation and mucus hypersecretion, adversely affect patient health and accelerate disease progression. Qingke Pingchuan (QKPC) granules, a formulation from Traditional Chinese Medicine initially prescribed for acute bronchitis, have shown unexplored potential in ECOPD management, with mechanisms of action yet to be clarified.
View Article and Find Full Text PDFProc Natl Acad Sci U S A
December 2024
HHMI, University of Michigan, Ann Arbor, MI 48109.
Intrinsically disordered protein regions (IDRs) are well established as contributors to intermolecular interactions and the formation of biomolecular condensates. In particular, RNA-binding proteins (RBPs) often harbor IDRs in addition to folded RNA-binding domains that contribute to RBP function. To understand the dynamic interactions of an IDR-RNA complex, we characterized the RNA-binding features of a small (68 residues), positively charged IDR-containing protein, Small ERDK-Rich Factor (SERF).
View Article and Find Full Text PDFCommun Inf Syst
October 2024
Dalton Cardiovascular Research Center, University of Missouri-Columbia.
Molecular docking stands as a pivotal element in the realm of computer-aided drug design (CADD), consistently contributing to advancements in pharmaceutical research. In essence, it employs computer algorithms to identify the "best" match between two molecules, akin to solving intricate three-dimensional jigsaw puzzles. At a more stringent level, the molecular docking challenge entails predicting the accurate bound association state based on the atomic coordinates of two molecules.
View Article and Find Full Text PDFNucleic Acids Res
January 2025
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
BindingDB (bindingdb.org) is a public, web-accessible database of experimentally measured binding affinities between small molecules and proteins, which supports diverse applications including medicinal chemistry, biochemical pathway annotation, training of artificial intelligence models and computational chemistry methods development. This update reports significant growth and enhancements since our last review in 2016.
View Article and Find Full Text PDFJ Cheminform
November 2024
MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, Beijing Frontier Research Center for Biological Structure, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China.
Predicting protein-small molecule binding sites, the initial step in structure-guided drug design, remains challenging for proteins lacking experimentally derived ligand-bound structures. Here, we propose CLAPE-SMB, which integrates a pre-trained protein language model with contrastive learning to provide high accuracy predictions of small molecule binding sites that can accommodate proteins without a published crystal structure. We trained and tested CLAPE-SMB on the SJC dataset, a non-redundant dataset based on sc-PDB, JOINED, and COACH420, and achieved an MCC of 0.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!