Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation.

J Cheminform

Computational Chemistry, Sosei Heptares, Steinmetz Building, Granta Park, Great Abington, Cambridge, CB21 6DG, UK.

Published: October 2022

A plethora of AI-based techniques now exists to conduct de novo molecule generation that can devise molecules conditioned towards a particular endpoint in the context of drug design. One popular approach is using reinforcement learning to update a recurrent neural network or language-based de novo molecule generator. However, reinforcement learning can be inefficient, sometimes requiring up to 10 molecules to be sampled to optimize more complex objectives, which poses a limitation when using computationally expensive scoring functions like docking or computer-aided synthesis planning models. In this work, we propose a reinforcement learning strategy called Augmented Hill-Climb based on a simple, hypothesis-driven hybrid between REINVENT and Hill-Climb that improves sample-efficiency by addressing the limitations of both currently used strategies. We compare its ability to optimize several docking tasks with REINVENT and benchmark this strategy against other commonly used reinforcement learning strategies including REINFORCE, REINVENT (version 1 and 2), Hill-Climb and best agent reminder. We find that optimization ability is improved ~ 1.5-fold and sample-efficiency is improved ~ 45-fold compared to REINVENT while still delivering appealing chemistry as output. Diversity filters were used, and their parameters were tuned to overcome observed failure modes that take advantage of certain diversity filter configurations. We find that Augmented Hill-Climb outperforms the other reinforcement learning strategies used on six tasks, especially in the early stages of training or for more difficult objectives. Lastly, we show improved performance not only on recurrent neural networks but also on a reinforcement learning stabilized transformer architecture. Overall, we show that Augmented Hill-Climb improves sample-efficiency for language-based de novo molecule generation conditioning via reinforcement learning, compared to the current state-of-the-art. This makes more computationally expensive scoring functions, such as docking, more accessible on a relevant timescale.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9531503PMC
http://dx.doi.org/10.1186/s13321-022-00646-zDOI Listing

Publication Analysis

Top Keywords

reinforcement learning
32
novo molecule
16
augmented hill-climb
12
language-based novo
12
molecule generation
12
reinforcement
8
learning
8
recurrent neural
8
computationally expensive
8
expensive scoring
8

Similar Publications

Successful resolution of approach-avoidance conflict (AAC) is fundamentally important for survival, and its dysregulation is a hallmark of many neuropsychiatric disorders, and yet the underlying neural circuit mechanisms are not well elucidated. Converging human and animal research has implicated the anterior/ventral hippocampus (vHPC) as a key node in arbitrating AAC in a region-specific manner. In this study, we sought to target the vHPC CA1 projection pathway to the nucleus accumbens (NAc) to delineate its contribution to AAC decision-making, particularly in the arbitration of learned reward and punishment signals, as well as innate signals.

View Article and Find Full Text PDF

Background: The electronic compensation (ECOMP) technique for breast radiation therapy provides excellent dose conformity and homogeneity. However, the manual fluence painting process presents a challenge for efficient clinical operation.

Purpose: To facilitate the clinical treatment planning automation of breast radiation therapy, we utilized reinforcement learning (RL) to develop an auto-planning tool that iteratively edits the fluence maps under the guidance of clinically relevant objectives.

View Article and Find Full Text PDF

Experience with a Self-Management Education Program for Adolescents with Type 1 Diabetes: A Qualitative Study.

Nurs Rep

January 2025

Nursing Research, Innovation and Development Centre of Lisbon (CIDNUR), University of Lisbon, Nursing School of Lisbon, 1600-190 Lisbon, Portugal.

: Adolescents with type 1 diabetes face complex challenges associated with the disease, underscoring the importance of developing self-management skills. This study examined participants' perspectives on a type 1 diabetes self-management education program. : Focus group interviews were conducted with 32 adolescents with type 1 diabetes who participated in the program and six expert patients.

View Article and Find Full Text PDF

This paper proposes a Q-learning-driven butterfly optimization algorithm (QLBOA) by integrating the Q-learning mechanism of reinforcement learning into the butterfly optimization algorithm (BOA). In order to improve the overall optimization ability of the algorithm, enhance the optimization accuracy, and prevent the algorithm from falling into a local optimum, the Gaussian mutation mechanism with dynamic variance was introduced, and the migration mutation mechanism was also used to enhance the population diversity of the algorithm. Eighteen benchmark functions were used to compare the proposed method with five classical metaheuristic algorithms and three BOA variable optimization methods.

View Article and Find Full Text PDF

Human-Inspired Gait and Jumping Motion Generation for Bipedal Robots Using Model Predictive Control.

Biomimetics (Basel)

January 2025

Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Wakamatsu-ku, Kitakyushu 808-0135, Japan.

In recent years, humanoid robot technology has been developing rapidly due to the need for robots to collaborate with humans or replace them in various tasks, requiring them to operate in complex human environments and placing high demands on their mobility. Developing humanoid robots with human-like walking and hopping abilities has become a key research focus, as these capabilities enable robots to move and perform tasks more efficiently in diverse and unpredictable environments, with significant applications in daily life, industrial operations, and disaster rescue. Currently, methods based on hybrid zero dynamics and reinforcement learning have been employed to enhance the walking and hopping capabilities of humanoid robots; however, model predictive control (MPC) presents two significant advantages: it can adapt to more complex task requirements and environmental conditions, and it allows for various walking and hopping patterns without extensive training and redesign.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!