Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.In this paper, we present fragSMILES, a novel fragment-based molecular representation in the form of string. fragSMILES encode fragments in a 'chemically-meaningful' way via a novel graph-reduction approach, allowing to obtain an efficient, interpretable, and expressive molecular representation, which also avoids fragment redundancy. fragSMILES contributes to the field of fragment-based representation, by reporting fragments and their 'breaking' bonds independently. Moreover, fragSMILES also embeds information of molecular chirality, thereby overcoming known limitations of existing string notations. When compared with SMILES, SELFIES and t-SMILES for de novo design, the fragSMILES notation showed its promise in generating molecules with desirable biochemical and scaffolds properties.

Download full-text PDF

Source
http://dx.doi.org/10.1038/s42004-025-01423-3DOI Listing

Publication Analysis

Top Keywords

molecular representation
8
fragsmiles
6
string
5
molecular
5
fragsmiles chemical
4
chemical string
4
string notation
4
notation advanced
4
advanced fragment
4
fragment chirality
4

Similar Publications

The enduring pathogenicity of can be attributed to its lipid-rich cell wall, with mycolic acids (MAs) being a significant constituent. Different MAs' fluidity and structural adaptability within the bacterial cell envelope significantly influence their physicochemical properties, operational capabilities, and pathogenic potential. Therefore, an accurate conformational representation of various MAs in aqueous media can provide insights into their potential role within the intricate structure of the bacterial cell wall.

View Article and Find Full Text PDF

AiGPro: a multi-tasks model for profiling of GPCRs for agonist and antagonist.

J Cheminform

January 2025

School of Systems Biomedical Science, Soongsil University, 369 Sangdo-ro, Dongjak-gu, 06978, Seoul, Republic of Korea.

G protein-coupled receptors (GPCRs) play vital roles in various physiological processes, making them attractive drug discovery targets. Meanwhile, deep learning techniques have revolutionized drug discovery by facilitating efficient tools for expediting the identification and optimization of ligands. However, existing models for the GPCRs often focus on single-target or a small subset of GPCRs or employ binary classification, constraining their applicability for high throughput virtual screening.

View Article and Find Full Text PDF

Generative models have revolutionized de novo drug design, allowing to produce molecules on-demand with desired physicochemical and pharmacological properties. String based molecular representations, such as SMILES (Simplified Molecular Input Line Entry System) and SELFIES (Self-Referencing Embedded Strings), have played a pivotal role in the success of generative approaches, thanks to their capacity to encode atom- and bond- information and ease-of-generation. However, such 'atom-level' string representations could have certain limitations, in terms of capturing information on chirality, and synthetic accessibility of the corresponding designs.

View Article and Find Full Text PDF

The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics.

Database (Oxford)

January 2025

Department of In Vitro Toxicology and Dermato-Cosmetology (IVTD), Vrije Universiteit Brussel, Laarbeeklaan 103, Brussels 1090, Belgium.

The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data.

View Article and Find Full Text PDF

Application of monomer structures and fragments of local symmetry for simulation of glass transition temperatures of polymers.

SAR QSAR Environ Res

January 2025

Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, Jackson, MS, USA.

A scheme for constructing models of the 'structure-glass transition temperature of a polymer' is proposed. It involves the representation of the molecular structure of a polymer through the architecture of monomer units represented through a simplified molecular input-line entry system (SMILES) and the fragments of local symmetry (FLS). The statistical quality of such models is quite good: the determination coefficient values for active training set, passive training set, calibration set, and validation set are 0.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!