The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes' law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5393296PMC
http://dx.doi.org/10.1007/s10822-016-0008-zDOI Listing

Publication Analysis

Top Keywords

chemical language
12
language model
12
molecular design
8
desired properties
8
machine learning
8
backward prediction
8
desired property
8
chemically unfavorable
8
unfavorable structures
8
chemical
5

Similar Publications

Depressive Symptoms and Amyloid Pathology.

JAMA Psychiatry

January 2025

Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg, Mölndal, Sweden.

Importance: Depressive symptoms are associated with cognitive decline in older individuals. Uncertainty about underlying mechanisms hampers diagnostic and therapeutic efforts. This large-scale study aimed to elucidate the association between depressive symptoms and amyloid pathology.

View Article and Find Full Text PDF

This research focused on the efficient collection of experimental metal-organic framework (MOF) data from scientific literature to address the challenges of accessing hard-to-find data and improving the quality of information available for machine learning studies in materials science. Utilizing a chain of advanced large language models (LLMs), we developed a systematic approach to extract and organize MOF data into a structured format. Our methodology successfully compiled information from more than 40,000 research articles, creating a comprehensive and ready-to-use data set.

View Article and Find Full Text PDF

Comparative Analysis of Recurrent Neural Networks with Conjoint Fingerprints for Skin Corrosion Prediction.

J Chem Inf Model

January 2025

Division of Pharmaceutical Chemistry, Faculty of Pharmaceutical Sciences, Khon Kaen University, Khon Kaen 40002, Thailand.

Skin corrosion assessment is an essential toxicity end point that addresses safety concerns for topical dosage forms and cosmetic products. Previously, skin corrosion assessments required animal testing; however, differences in skin architecture and ethical concerns regarding animal models have fostered the advancement of alternative methods such as and models. This study aimed to develop deep learning (DL) models based on recurrent neural networks (RNNs) for classifying skin corrosion of chemical compounds based on chemical language notation, molecular substructure, physicochemical properties, and a combination of these three properties called conjoint fingerprints.

View Article and Find Full Text PDF

Gas and Liquid Isotherms: The Need for a Common Foundation.

Langmuir

January 2025

Division of Chemical Engineering, Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan.

Sorption isotherms for gases and liquids have long been formulated separately. There is a fundamental problem with this approach: the popular isotherm models (such as Langmuir, BET, and GAB) for gases cannot be applied straightforwardly to sorption from solution. This contrasts with the theory of liquid solutions, where solute-solute interaction, mediated by the solvent, is captured as the potential of mean force, providing powerful interpretive tools (e.

View Article and Find Full Text PDF

MyEcoReporter: a prototype for artificial intelligence-facilitated pollution reporting.

J Expo Sci Environ Epidemiol

January 2025

Department of Veterinary Physiology & Pharmacology, Texas A&M University, College Station, TX, USA.

Background: Many chemical releases are first noticed by community members, but reporting these concerns often involves considerable hurdles. Artificial Intelligence (AI)-enabled technologies, especially large language models (LLMs), can potentially reduce these barriers.

Objective: We hypothesized that AI-powered chatbots can facilitate reporting of pollution incidents through text messaging.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!