Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Most previous works typically generate atoms autoregressively, with element types and 3D coordinates of atoms generated one by one. However, in real-world molecular systems, interactions among atoms are global, spanning the entire molecule, leading to pair-coupled energy function among atoms.
View Article and Find Full Text PDFProteins govern most biological functions essential for life, and achieving controllable protein editing has made great advances in probing natural systems, creating therapeutic conjugates, and generating novel protein constructs. Recently, machine learning-assisted protein editing (MLPE) has shown promise in accelerating optimization cycles and reducing experimental workloads. However, current methods struggle with the vast combinatorial space of potential protein edits and cannot explicitly conduct protein editing using biotext instructions, limiting their interactivity with human feedback.
View Article and Find Full Text PDFJ Chem Inf Model
December 2024
Three-dimensional (3D) molecular generation models employ deep neural networks to simultaneously generate both topological representation and molecular conformations. Due to their advantages in utilizing the structural and interaction information on targets, as well as their reduced reliance on existing bioactivity data, these models have attracted widespread attention. However, limited training and testing data sets and the unexpected biases inherent in single evaluation metrics pose a significant challenge in comparing these models in practical settings.
View Article and Find Full Text PDFJ Chem Theory Comput
December 2024
Enhanced sampling simulations make the computational study of rare events feasible. A large family of such methods crucially depends on the definition of some collective variables (CVs) that could provide a low-dimensional representation of the relevant physics of the process. Recently, many methods have been proposed to semiautomatize the CV design by using machine learning tools to learn the variables directly from the simulation data.
View Article and Find Full Text PDFThe generation of three-dimensional (3D) molecules based on target structures represents a cutting-edge challenge in drug discovery. Many existing approaches often produce molecules with invalid configurations, unphysical conformations, suboptimal drug-like qualities, limited synthesizability, and require extensive generation times. To address these challenges, we present 3DSMILES-GPT, a fully language-model-driven framework for 3D molecular generation that utilizes tokens exclusively.
View Article and Find Full Text PDFDespite the significant potential of generative models, low synthesizability of many generated molecules limits their real-world applications. In response to this issue, we develop ClickGen, a deep learning model that utilizes modular reactions like click chemistry to assemble molecules and incorporates reinforcement learning along with inpainting technique to ensure that the proposed molecules display high diversity, novelty and strong binding tendency. ClickGen demonstrates superior performance over the other reaction-based generative models in terms of novelty, synthesizability, and docking conformation similarity for existing binders targeting the three proteins.
View Article and Find Full Text PDF3D structure-based molecular generation is a successful application of generative AI in drug discovery. Most earlier models follow an atom-wise paradigm, generating molecules with good docking scores but poor molecular properties (like synthesizability and drugability). In contrast, fragment-wise generation offers a promising alternative by assembling chemically viable fragments.
View Article and Find Full Text PDFProteolysis-targeting chimeras (PROTACs) are drugs designed to degrade target proteins via the ubiquitin-proteasome system. With the application of computational biology/chemistry technique in drug design, numerous computer-aided drug design and artificial intelligence (AI)-driven drug design (CADD/AIDD) methods have recently emerged to facilitate the development of PROTAC drugs. We systematically review the role of in silico tools in PROTAC drug design, emphasizing how computational software can model PROTAC action and structure, predict activity, and assist in molecule design.
View Article and Find Full Text PDFStructure-based machine learning algorithms have been utilized to predict the properties of protein-protein interaction (PPI) complexes, such as binding affinity, which is critical for understanding biological mechanisms and disease treatments. While most existing algorithms represent PPI complex graph structures at the atom-scale or residue-scale, these representations can be computationally expensive or may not sufficiently integrate finer chemical-plausible interaction details for improving predictions. Here, we introduce MCGLPPI, a geometric representation learning framework that combines graph neural networks (GNNs) with MARTINI molecular coarse-grained (CG) models to predict PPI overall properties accurately and efficiently.
View Article and Find Full Text PDFThe transformation of clinical androgen receptor (AR) antagonists into agonists driven by AR mutations poses a significant challenge in treating prostate cancer (PCa). Novel anti-AR therapeutics combating mutation-induced resistance are required. Herein, by combining structure-based virtual screening and biological evaluation, a high-affinity agonist was first discovered.
View Article and Find Full Text PDFThe integration of deep learning-based molecular generation models into drug discovery has garnered significant attention for its potential to expedite the development process. Central to this is lead optimization, a critical phase where existing molecules are refined into viable drug candidates. As various methods for deep lead optimization continue to emerge, it is essential to classify these approaches more clearly.
View Article and Find Full Text PDFAndrogen receptor (AR) is a crucial driver of prostate cancer (PCa), but acquired resistance to AR antagonists significantly undermines their clinical efficacy. We previously discovered coumarin derivative , which is capable of disrupting AR ligand-binding domain dimers, offering the potential for overcoming resistance. However, its poor oral bioavailability limited further development.
View Article and Find Full Text PDFThe rational design of targeted covalent inhibitors (TCIs) has emerged as a powerful strategy in drug discovery, known for its ability to achieve strong binding affinity and prolonged target engagement. However, the development of covalent drugs is often challenged by the need to optimize both covalent warhead and non-covalent interactions, alongside the limitations of existing compound libraries. To address these challenges, we present CovalentInDB 2.
View Article and Find Full Text PDFA number of anaplastic lymphoma kinase (ALK) inhibitors have been clinically approved, with lorlatinib, particularly as a third-generation drug, demonstrating efficacy against various drug-resistant ALK single mutations. However, continued clinical use of lorlatinib has led to the emergence of ALK double mutations conferring resistance to lorlatinib, notably ALK. TPX-0131 is a potential fourth-generation ALK inhibitor currently under development.
View Article and Find Full Text PDFCRM1 (chromosomal region maintenance 1, also referred to as exportin 1 or XPO1) plays a crucial role in maintaining the appropriate nuclear levels of tumor suppressor proteins (TSPs), growth regulatory proteins (GRPs), and antiapoptotic proteins, thereby contributing significantly to their anticancer effects. Dysregulation of CRM1-mediated nuclear transport, observed in a range of cancers such as colon cancer as well as autoimmune diseases, highlights its significance in various disease processes. In this paper, we employed a customized structure-based virtual screening campaign to search for novel covalent CRM1 inhibitors and purchased 50 potentially active compounds for in vitro bioassays.
View Article and Find Full Text PDFProteolytic targeting chimeras (PROTACs), as an emerging type of drug, function by proximity-based modalities that narrow the distance between a target protein and the E3 ubiquitin ligase to facilitate the ubiquitination labeling of the target protein for degradation. Although it is evidenced that the cooperativity of the PROTAC ternary interaction is one of the key factors affecting the degradation rate of a target protein, PROTAC design utilizing this indicator is still challenging because of the complicated/flexible interactions in a target-PROTAC-E3 ternary system. Therefore, developing reliable and practicable computational methods is of great interest for PROTAC design.
View Article and Find Full Text PDFAndrogen receptor (AR) is an important therapeutic target for prostate cancer (PCa) treatment, but prolonged use of AR antagonists has led to variant drug-resistant mutations. Since all marketed AR antagonists target the ligand binding pocket (LBP) of AR, to mitigate cross-resistance, a new drug pocket named Dimer Interface Pocket was discovered and a novel AR antagonist was identified. showed strong efficacy against PCa but had poor pharmacokinetic properties .
View Article and Find Full Text PDFProteolysis-targeting chimera (PROTAC) is an emerging therapeutic technology that leverages the ubiquitin-proteasome system to target protein degradation. Due to its event-driven mechanistic characteristics, PROTAC has the potential to regulate traditionally non-druggable targets. Recently, AI-aided drug design has accelerated the development of PROTAC drugs.
View Article and Find Full Text PDFMolecular generation stands at the forefront of AI-driven technologies, playing a crucial role in accelerating the development of small molecule drugs. The intricate nature of practical drug discovery necessitates the development of a versatile molecular generation framework that can tackle diverse drug design challenges. However, existing methodologies often struggle to encompass all aspects of small molecule drug design, particularly those rooted in language models, especially in tasks like linker design, due to the autoregressive nature of large language model-based approaches.
View Article and Find Full Text PDFAnnotating active sites in enzymes is crucial for advancing multiple fields including drug discovery, disease research, enzyme engineering, and synthetic biology. Despite the development of numerous automated annotation algorithms, a significant trade-off between speed and accuracy limits their large-scale practical applications. We introduce EasIFA, an enzyme active site annotation algorithm that fuses latent enzyme representations from the Protein Language Model and 3D structural encoder, and then aligns protein-level information with the knowledge of enzymatic reactions using a multi-modal cross-attention framework.
View Article and Find Full Text PDFMajor histocompatibility complex (MHC) plays a vital role in presenting epitopes (short peptides from pathogenic proteins) to T-cell receptors (TCRs) to trigger the subsequent immune responses. Vaccine design targeting MHC generally aims to find epitopes with a high binding affinity for MHC presentation. Nevertheless, to find novel epitopes usually requires high-throughput screening of bulk peptide database, which is time-consuming, labor-intensive, more unaffordable, and very expensive.
View Article and Find Full Text PDFRetrosynthesis is a crucial task in drug discovery and organic synthesis, where artificial intelligence (AI) is increasingly employed to expedite the process. However, existing approaches employ token-by-token decoding methods to translate target molecule strings into corresponding precursors, exhibiting unsatisfactory performance and limited diversity. As chemical reactions typically induce local molecular changes, reactants and products often overlap significantly.
View Article and Find Full Text PDFProtein loop modeling is a challenging yet highly nontrivial task in protein structure prediction. Despite recent progress, existing methods including knowledge-based, ab initio, hybrid, and deep learning (DL) methods fall substantially short of either atomic accuracy or computational efficiency. To overcome these limitations, we present KarmaLoop, a novel paradigm that distinguishes itself as the first DL method centered on full-atom (encompassing both backbone and side-chain heavy atoms) protein loop modeling.
View Article and Find Full Text PDFAnalyzing drug-related interactions in the field of biomedicine has been a critical aspect of drug discovery and development. While various artificial intelligence (AI)-based tools have been proposed to analyze drug biomedical associations (DBAs), their feature encoding did not adequately account for crucial biomedical functions and semantic concepts, thereby still hindering their progress. Since the advent of ChatGPT by OpenAI in 2022, large language models (LLMs) have demonstrated rapid growth and significant success across various applications.
View Article and Find Full Text PDF