Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423346PMC
http://dx.doi.org/10.1021/acs.jcim.4c00422DOI Listing

Publication Analysis

Top Keywords

drug-target interactions
8
protein molecule
8
molecule spaces
8
dti prediction
8
molecule protein
8
large datasets
8
coverage molecule
8
compromising performance
8
prediction
7
komet
7

Similar Publications

Aldehyde Dehydrogenase 2 Lactylation Aggravates Mitochondrial Dysfunction by Disrupting PHB2 Mediated Mitophagy in Acute Kidney Injury.

Adv Sci (Weinh)

December 2024

Department of Nephrology, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100730, China.

Mitochondrial dysfunction is a crucial event in acute kidney injury (AKI), leading to a metabolic shift toward glycolysis and increased lactate production. Lactylation, a posttranslational modification derived from lactate, plays a significant role in various cellular processes, yet its implications in AKI remain underexplored. Here, a marked increase in lactate levels and pan-Kla levels are observed in kidney tissue from AKI patients and mice, with pronounced lactylation activity in injured proximal tubular cells identified by single-cell RNA sequencing.

View Article and Find Full Text PDF

Selective Colocalization of GHSR and GLP-1R in a Subset of Hypothalamic Neurons and Their Functional Interaction.

Endocrinology

November 2024

Laboratory of Neurophysiology, Multidisciplinary Institute of Cell Biology [IMBICE; Argentine Research Council (CONICET); Scientific Research Commission, Province of Buenos Aires (CIC-PBA); National University of La Plata], B1906APO La Plata, Buenos Aires, Argentina.

The GH secretagogue receptor (GHSR) and the glucagon-like peptide-1 receptor (GLP-1R) are G protein-coupled receptors with critical, yet opposite, roles in regulating energy balance. Interestingly, these receptors are expressed in overlapping brain regions. However, the extent to which they target the same neurons and engage in molecular crosstalk remains unclear.

View Article and Find Full Text PDF

Cancer, characterized by uncontrolled growth and spread of abnormal cells potentially influencing almost all tissues in the body, is one of the most devastating and lethal diseases throughout the world. Chemotherapy is one of the principal approaches for cancer treatment, but multidrug resistance and severe side effects represent the main barriers to the success of therapy, creating a vital need to develop novel chemotherapeutic agents. The 1,2,3-triazole moiety can be conveniently constructed by "click chemistry" and could exert diverse noncovalent interactions with various enzymes in cancer cells.

View Article and Find Full Text PDF

Critical Role of Nanomaterial Mechanical Properties in Drug Delivery, Nanovaccines and Beyond.

Adv Mater

December 2024

School of Chemical Engineering, The University of Adelaide, North Terrace, South Australia, 5005, Australia.

Nanomaterials have become essential in the daily lives, finding applications in food, skincare, drugs, and vaccines. Traditionally, the surface chemistry of nanoparticles (NPs) is considered the key factor in determining their interactions with biological systems. However, recent studies have shown that the mechanical properties of nanomaterials are equally important in regulating nano-bio interactions, though they have often been overlooked.

View Article and Find Full Text PDF

Accurate prediction of drug-target binding affinity remains a fundamental challenge in contemporary drug discovery. Despite significant advances in computational methods for protein-ligand binding affinity prediction, current approaches still face substantial limitations in prediction accuracy. Moreover, the prevalent methodologies often overlook critical three-dimensional (3D) structural information, thereby constraining their practical utility in computer-aided drug design (CADD).

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!