Training agents via deep reinforcement learning with sparse rewards for robotic control tasks in vast state space are a big challenge, due to the rareness of successful experience. To solve this problem, recent breakthrough methods, the hindsight experience replay (HER) and aggressive rewards to counter bias in HER (ARCHER), use unsuccessful experiences and consider them as successful experiences achieving different goals, for example, hindsight experiences. According to these methods, hindsight experience is used at a fixed sampling rate during training. However, this usage of hindsight experience introduces bias, due to a distinct optimal policy, and does not allow the hindsight experience to take variable importance at different stages of training. In this article, we investigate the impact of a variable sampling rate, representing the variable rate of hindsight experience, on training performance and propose a sampling rate decay strategy that decreases the number of hindsight experiences as training proceeds. The proposed method is validated with three robotic control tasks included in the OpenAI Gym suite. The experimental results demonstrate that the proposed method achieves improved training performance and increased convergence speed over the HER and ARCHER with two of the three tasks and comparable training performance and convergence speed with the other one.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TCYB.2020.2990722DOI Listing

Publication Analysis

Top Keywords

hindsight experience
24
sampling rate
16
training performance
12
rate decay
8
hindsight
8
experience replay
8
robotic control
8
control tasks
8
methods hindsight
8
hindsight experiences
8

Similar Publications

While pre-verbal infants may be sensitive to others' mental states, they are not able to accurately answer questions about them until several years later, an ability referred to as having a theory of mind. Here we ask whether infant social-cognitive sensitivity is subserved by the same brain mechanisms as those that support theory of mind in childhood. To do so, we explored the relationship between functional sensitivity of the right temporal-parietal junction to mental state processing in infancy, a region known to underlie theory of mind in older children, and explicit theory of mind reasoning in the same group several years later.

View Article and Find Full Text PDF

Learning from Hindsight: Examining Autonomic, Inflammatory, and Endocrine Stress Biomarkers and Mental Health in Healthy Terrorism Survivors Many Years Later.

Prehosp Disaster Med

January 2025

Assistant Professor, Department of Internal Medicine, UT Southwestern Medical Center, Statistician/Section Chief of Analytics, Research Service, VA North Texas HCS, Dallas, TexasUSA.

Introduction: Terrorism and trauma survivors often experience changes in biomarkers of autonomic, inflammatory and hypothalamic-pituitary-adrenal (HPA) axis assessed at various times. Research suggests interactions of these systems in chronic stress.

Study Objective: This unprecedented retrospective study explores long-term stress biomarkers in three systems in terrorism survivors.

View Article and Find Full Text PDF

Agile and adaptive maneuvers such as fall recovery, high-speed turning, and sprinting in the wild are challenging for legged systems. We propose a Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end tracking controller that achieves powerful agility and adaptation for the legged robot. The two key components are (i) a novel automatic curriculum strategy on task difficulty and (ii) a Hindsight Experience Replay strategy adapted to legged locomotion tasks.

View Article and Find Full Text PDF

Highly valued subgoal generation for efficient goal-conditioned reinforcement learning.

Neural Netw

January 2025

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, China. Electronic address:

Article Synopsis
  • Goal-conditioned reinforcement learning helps robots perform specific tasks by maximizing rewards, but it faces challenges due to sparse rewards that hinder the learning process.
  • The proposed method generates meaningful subgoals tailored to the context of tasks, allowing robots to learn more efficiently through better action value learning.
  • Compared to existing methods like Hindsight Experience Replay, this approach improves stability and performance in robotic tasks by creating subgoals that are contextually relevant and appropriately complex.
View Article and Find Full Text PDF

The journey of patients in cancer clinical trials: A qualitative meta-synthesis on experiences and perspectives.

Patient Educ Couns

January 2025

Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy; Applied Research Division for Cognitive and Psychological Science, IEO European Insitute of Oncology IRCCS, Milan, Italy.

Article Synopsis
  • - The study aimed to synthesize qualitative research on adult cancer patients' experiences and perspectives regarding clinical trials using a meta-synthesis of 45 papers.
  • - Three main themes were identified related to the trial timeline: pre-trial participation (information needs and decision-making), ongoing trials (supportive care and maintaining hope), and post-trial (understanding results and feelings of neglect).
  • - The conclusion highlights the need for more focus on post-trial experiences to support patients' well-being and reduce dropout rates, advocating for better communication and remote options to improve participation.
View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!