Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.In this paper, we present fragSMILES, a novel fragment-based molecular representation in the form of string. fragSMILES encode fragments in a 'chemically-meaningful' way via a novel graph-reduction approach, allowing to obtain an efficient, interpretable, and expressive molecular representation, which also avoids fragment redundancy. fragSMILES contributes to the field of fragment-based representation, by reporting fragments and their 'breaking' bonds independently. Moreover, fragSMILES also embeds information of molecular chirality, thereby overcoming known limitations of existing string notations. When compared with SMILES, SELFIES and t-SMILES for de novo design, the fragSMILES notation showed its promise in generating molecules with desirable biochemical and scaffolds properties.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1038/s42004-025-01423-3 | DOI Listing |
Langmuir
January 2025
Center for Condensed Matter Theory, Department of Physics, Indian Institute of Science (IISc), Bangalore 560012, India.
The enduring pathogenicity of can be attributed to its lipid-rich cell wall, with mycolic acids (MAs) being a significant constituent. Different MAs' fluidity and structural adaptability within the bacterial cell envelope significantly influence their physicochemical properties, operational capabilities, and pathogenic potential. Therefore, an accurate conformational representation of various MAs in aqueous media can provide insights into their potential role within the intricate structure of the bacterial cell wall.
View Article and Find Full Text PDFJ Cheminform
January 2025
School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, 06978, Seoul, Republic of Korea.
G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ binary classification, constraining their applicability for high throughput virtual screening.
View Article and Find Full Text PDFCommun Chem
January 2025
Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, Bari, Italy.
Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.
View Article and Find Full Text PDFDatabase (Oxford)
January 2025
Department of In Vitro Toxicology and Dermato-Cosmetology (IVTD), Vrije Universiteit Brussel, Laarbeeklaan 103, Brussels 1090, Belgium.
The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data.
View Article and Find Full Text PDFSAR QSAR Environ Res
January 2025
Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, USA.
A scheme for constructing models of the 'structure-glass transition temperature of a polymer' is proposed. It involves the representation of the molecular structure of a polymer through the architecture of monomer units represented through a simplified molecular input-line entry system (SMILES) and the fragments of local symmetry (FLS). The statistical quality of such models is quite good: the determination coefficient values for active training set, passive training set, calibration set, and validation set are 0.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!