Representation learning for continuous action spaces is beneficial for efficient policy learning.

Neural Netw

RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan; Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan. Electronic address:

Published: February 2023

Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL) with the help of the perception capability of deep learning and has been widely applied in real-world problems. While model-free RL, as a class of efficient DRL methods, performs the learning of state representations simultaneously with policy learning in an end-to-end manner when facing large-scale continuous state and action spaces. However, training such a large policy model requires a large number of trajectory samples and training time. On the other hand, the learned policy often fails to generalize to large-scale action spaces, especially for the continuous action spaces. To address this issue, in this paper we propose an efficient policy learning method in latent state and action spaces. More specifically, we extend the idea of state representations to action representations for better policy generalization capability. Meanwhile, we divide the whole learning task into learning with the large-scale representation models in an unsupervised manner and learning with the small-scale policy model in the RL manner. The small policy model facilitates policy learning, while not sacrificing generalization and expressiveness via the large representation model. Finally, the effectiveness of the proposed method is demonstrated by MountainCar, CarRacing and Cheetah experiments.

Download full-text PDF

Source
http://dx.doi.org/10.1016/j.neunet.2022.12.009DOI Listing

Publication Analysis

Top Keywords

action spaces
20
policy learning
16
policy model
12
learning
11
policy
9
continuous action
8
efficient policy
8
reinforcement learning
8
state representations
8
state action
8

Similar Publications

In recent years, there has been a growing interest among researchers in Internet of Things Blockchain (IoTB). A critical aspect of IoTB is its consensus protocol, which faces challenges such as limited bandwidth, energy constraints, and storage space restrictions. To tackle these challenges, Hierarchical IoTB (HIoTB) networks have been proposed.

View Article and Find Full Text PDF

Voltage-gated potassium conductances [Formula: see text] play a critical role not only in normal neural function, but also in many neurological disorders and related therapeutic interventions. In particular, in an important animal model of epileptic seizures, 4-aminopyridine (4-AP) administration is thought to induce seizures by reducing [Formula: see text] in cortex and other brain areas. Interestingly, 4-AP has also been useful in the treatment of neurological disorders such as multiple sclerosis (MS) and spinal cord injury, where it is thought to improve action potential propagation in axonal fibers.

View Article and Find Full Text PDF

Advancing cancer therapy with custom-built alternating electric field devices.

Bioelectron Med

January 2025

School of Pharmacy, Biodiscovery Institute & Boots Science Building, University of Nottingham, Nottingham, NG7 2RD, UK.

Background: In glioblastoma (GBM) therapy research, tumour treating fields by the company Novocure™, have shown promise for increasing patient overall survival. When used with the chemotherapeutic agent temozolomide, they extend median survival by five months. However, there is a space to design alternative systems that will be amenable for wider use in current research.

View Article and Find Full Text PDF

The rising frequency and severity of landslides in the vulnerable Himalayan region of India threaten human settlements and critical infrastructure. This growing issue demands urgent action and innovative strategies to mitigate risks and bolster the resilience of affected communities and infrastructure in this fragile area. The research explores the use of Alnus nepalensis for slope stabilization, illustrated by a case study near Ukhimath, Uttarakhand, India, and elucidates the potential ecological niche of Alnus in the temperate region of Uttarakhand using well-dispersed species occurrence records along with environment.

View Article and Find Full Text PDF

Understanding factors influencing the spatio-temporal patterns of apex predators is prerequisite for their conservation. We studied space use and diel activity of tigers () in response to prey availability and anthropogenic activities with trail cameras in Nepal during December 2022-March 2023. We used hierarchical occupancy models to evaluate how prey availability (space use of prey species) and anthropogenic activities (number of humans and livestock) contributed to the tigers' space use, while accounting for landscape effects on their detection probability.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!