Deep generative models have advanced drug discovery but often generate compounds with limited structural novelty, providing constrained inspiration for medicinal chemists. To address this, we develop TransPharmer, a generative model that integrates ligand-based interpretable pharmacophore fingerprints with a generative pre-training transformer (GPT)-based framework for de novo molecule generation. TransPharmer excels in unconditioned distribution learning, de novo generation, and scaffold elaboration under pharmacophoric constraints.
View Article and Find Full Text PDFMany cancer drugs that target cancer cell pathways also impair the immune system. We developed a computational target discovery platform to enable examination of both cancer and immune cells so as to identify pathways that restrain tumor progression and potentiate anti-tumor immunity. Immune-related CRISPR screen analyzer of functional targets (ICRAFT) integrates immune-related CRISPR screen datasets, single-cell RNA sequencing (scRNA-seq) data, and pre-treatment RNA-seq data from clinical trials, enabling a systems-level approach to therapeutic target discovery.
View Article and Find Full Text PDFProtein phosphorylation plays a crucial role in regulating a wide range of biological processes, and its dysregulation is strongly linked to various diseases. While many phosphorylation sites have been identified so far, their functionality and regulatory effects are largely unknown. Here, a deep learning model MMFuncPhos, based on a multi-modal deep learning framework, is developed to predict functional phosphorylation sites.
View Article and Find Full Text PDFBackground: In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel nonhomologous proteins that diverge significantly from existing ones.
View Article and Find Full Text PDFComputer-assisted synthesis planning has emerged as a valuable tool for organic synthesis. Prediction of reaction conditions is crucial for applying the planned synthesis routes. However, achieving diverse suggestions while ensuring the reasonableness of predictions remains an underexplored challenge.
View Article and Find Full Text PDFAngew Chem Int Ed Engl
December 2024
Designing sequences for specific protein backbones is a key step in creating new functional proteins. Here, we introduce GeoSeqBuilder, a deep learning framework that integrates protein sequence generation with side chain conformation prediction to produce the complete all-atom structures for designed sequences. GeoSeqBuilder uses spatial geometric features from protein backbones and explicitly includes three-body interactions of neighboring residues.
View Article and Find Full Text PDFIntrinsically disordered proteins (IDPs) are emerging therapeutic targets for human diseases. However, probing their transient conformations remains challenging because of conformational heterogeneity. To address this problem, we developed a biosensor using a point-functionalized silicon nanowire (SiNW) that allows for real-time sampling of single-molecule dynamics.
View Article and Find Full Text PDFDespite the exciting progress in target-specific protein binder design, peptide binder design remains challenging due to the flexibility of peptide structures and the scarcity of protein-peptide complex structure data. In this study, we curated a large synthetic data set, referred to as PepPC-F, from the abundant protein-protein interface data and developed DiffPepBuilder, a target-specific peptide binder generation method that utilizes an SE(3)-equivariant diffusion model trained on PepPC-F to codesign peptide sequences and structures. DiffPepBuilder also introduces disulfide bonds to stabilize the generated peptide structures.
View Article and Find Full Text PDFComputer-assisted synthesis planning has become increasingly important in drug discovery. While deep-learning models have shown remarkable progress in achieving high accuracies for single-step retrosynthetic predictions, their performances in retrosynthetic route planning need to be checked. This study compares the intricate single-step models with a straightforward template enumeration approach for retrosynthetic route planning on a real-world drug molecule data set.
View Article and Find Full Text PDFMolecular docking, a key technique in structure-based drug design, plays pivotal roles in protein-ligand interaction modeling, hit identification and optimization, in which accurate prediction of protein-ligand binding mode is essential. Conventional docking approaches perform well in redocking tasks with known protein binding pocket conformation in the complex state. However, in real-world docking scenario without knowing the protein binding conformation for a new ligand, accurately modeling the binding complex structure remains challenging as flexible docking is computationally expensive and inaccurate.
View Article and Find Full Text PDFThe COVID-19 caused by SARS-CoV-2 has led to a global pandemic that continues to impact societies and economies worldwide. The main protease (M) plays a crucial role in SARS-CoV-2 replication and is an attractive target for anti-SARS-CoV-2 drug discovery. Herein, we report a series of 3-oxo-1,2,3,4-tetrahydropyrido[1,2-a]pyrazin derivatives as non-peptidomimetic inhibitors targeting SARS-CoV-2 M through structure-based virtual screening and biological evaluation.
View Article and Find Full Text PDFVarious biological agents have been developed to target tumor necrosis factor alpha (TNF-α) and its receptor TNFR1 for the rheumatoid arthritis (RA) treatment, whereas small molecules modulating such cytokine receptors are rarely reported in comparison to the biologicals. Here, by revealing the mechanism of action of vinigrol, a diterpenoid natural product, we show that inhibition of the protein disulfide isomerase (PDI, PDIA1) by small molecules activates A disintegrin and metalloprotease 17 (ADAM17) and then leads to the TNFR1 shedding on mouse and human cell membranes. This small-molecule-induced receptor shedding not only effectively blocks the inflammatory response caused by TNF-α in cells, but also reduces the arthritic score and joint damage in the collagen-induced arthritis mouse model.
View Article and Find Full Text PDFAlthough loop epitopes at protein-protein binding interfaces often play key roles in mediating oligomer formation and interaction specificity, their binding sites are underexplored as drug targets owing to their high flexibility, relatively few hot spots, and solvent accessibility. Prior attempts to develop molecules that mimic loop epitopes to disrupt protein oligomers have had limited success. In this study, we used structure-based approaches to design and optimize cyclic-constrained peptides based on loop epitopes at the human phosphoglycerate dehydrogenase (PHGDH) dimer interface, which is an obligate homo-dimer with activity strongly dependent on the oligomeric state.
View Article and Find Full Text PDFIntrinsically disordered proteins (IDPs) play crucial roles in cellular processes and hold promise as drug targets. However, the dynamic nature of IDPs remains poorly understood. Here, we construct a single-molecule electrical nanocircuit based on silicon nanowire field-effect transistors (SiNW-FETs) and functionalize it with an individual disordered c-Myc bHLH-LZ domain to enable label-free, in situ, and long-term measurements at the single-molecule level.
View Article and Find Full Text PDFThe eukaryotic single-stranded DNA (ssDNA)-binding protein Replication Protein A (RPA) plays a crucial role in various DNA metabolic pathways, including DNA replication and repair, by dynamically associating with ssDNA. While the binding of a single RPA molecule to ssDNA has been thoroughly studied, the accessibility of ssDNA is largely governed by the bimolecular behavior of RPA, the biophysical nature of which remains unclear. In this study, we develop a three-step low-complexity ssDNA Curtains method, which, when combined with biochemical assays and a Markov chain model in non-equilibrium physics, allow us to decipher the dynamics of multiple RPA binding to long ssDNA.
View Article and Find Full Text PDFMotivation: In recent years, high-throughput sequencing technologies have made large-scale protein sequences accessible. However, their functional annotations usually rely on low-throughput and pricey experimental studies. Computational prediction models offer a promising alternative to accelerate this process.
View Article and Find Full Text PDFLigand binding sites provide essential information for uncovering protein functions and structure-based drug discovery. To facilitate cavity detection and property analysis process, we developed a comprehensive web server, CavityPlus in 2018. CavityPlus applies the CAVITY program to detect potential binding sites in a given protein structure.
View Article and Find Full Text PDFActa Biochim Biophys Sin (Shanghai)
June 2023
Biomolecular condensates formed by phase separation are involved in many cellular processes. Dysfunctional or abnormal condensates are closely associated with neurodegenerative diseases, cancer and other diseases. Small molecules can effectively regulate protein phase separation by modulating the formation, dissociation, size and material properties of condensates.
View Article and Find Full Text PDFSpring viremia of carp virus (SVCV) is a highly pathogenic infecting the common carp, yet neither a vaccine nor effective therapies are available to treat spring viremia of carp (SVC). Like all negative-sense viruses, SVCV contains an RNA genome that is encapsidated by the nucleoprotein (N) in the form of a ribonucleoprotein (RNP) complex, which serves as the template for viral replication and transcription. Here, the three-dimensional (3D) structure of SVCV RNP was resolved through cryo-electron microscopy (cryo-EM) at a resolution of 3.
View Article and Find Full Text PDFAllostery is an important regulatory mechanism of protein functions. Among allosteric proteins, certain protein structure types are more observed. However, how allosteric regulation depends on protein topology remains elusive.
View Article and Find Full Text PDFAllostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening.
View Article and Find Full Text PDFThe drug development pipeline involves several stages including in vitro assays, in vivo assays, and clinical trials. For candidate selection, it is important to consider that a compound will successfully pass through these stages. Using graph neural networks, we developed three subdivisional models to individually predict the capacity of a compound to enter in vivo testing, clinical trials, and market approval stages.
View Article and Find Full Text PDFThe development of efficient computational methods for drug target protein identification can compensate for the high cost of experiments and is therefore of great significance for drug development. However, existing structure-based drug target protein-identification algorithms are limited by the insufficient number of proteins with experimentally resolved structures. Moreover, sequence-based algorithms cannot effectively extract information from protein sequences and thus display insufficient accuracy.
View Article and Find Full Text PDF