In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.

Download full-text PDF

Source
http://dx.doi.org/10.3934/mbe.2022430DOI Listing

Publication Analysis

Top Keywords

reinforcement learning
20
deep reinforcement
16
reward function
12
detail-reward mechanism
8
detailed requirements
8
control process
8
replace utility
8
utility function
8
hjb equation
8
dynamic environment
8

Similar Publications

Transitive inference, the ability to establish hierarchical relationships between stimuli, is typically tested by training with premise pairs (e.g., A + B-, B + C-, C + D-, D + E-), which establishes a stimulus hierarchy (A > B > C > D > E).

View Article and Find Full Text PDF

Recent research has highlighted a notable confidence bias in the haptic sense, yet its impact on learning relative to other senses remains unexplored. This online study investigated learning behaviour across visual, auditory, and haptic modalities using a probabilistic selection task on computers and mobile devices, employing dynamic and ecologically valid stimuli to enhance generalisability. We analysed reaction time as an indicator of confidence, alongside learning speed and task accuracy.

View Article and Find Full Text PDF

The power of belief? Evidence of reduced fear extinction learning in Catholic God believers.

Front Public Health

January 2025

Dipartimento di Scienze Cognitive, Psicologiche, Pedagogiche e Degli Studi Culturali, Università di Messina, Messina, Italy.

Religious beliefs can shape how people process fear. Yet the psychophysiological mechanisms underlying this phenomenon remain poorly understood. We investigated fear learning and extinction processes in a group of individuals who professed a belief in God, compared to non-believers.

View Article and Find Full Text PDF

Those with diabetes mellitus are at high-risk of developing psychiatric disorders, especially mood disorders, yet the link between hyperglycemia and altered motivation has not been thoroughly explored. Here, we characterized value-based decision-making behavior of a streptozocin-induced diabetic mouse model on Restaurant Row, a naturalistic neuroeconomic foraging paradigm capable of behaviorally capturing multiple decision systems known to depend on dissociable neural circuits. Mice made self-paced choices on a daily limited time-budget, accepting or rejecting reward offers based on cost (delays cued by tone pitch) and subjective value (flavors), in a closed-economy system tested across months.

View Article and Find Full Text PDF

Traditional decision-making models conceptualize humans as adaptive learners utilizing the differences between expected and actual rewards (prediction errors, PEs) to maximize outcomes, but rarely consider the influence of violations of emotional expectations (emotional PEs) and how it differs from reward PEs. Here, we conducted a fMRI experiment (n = 43) using a modified Ultimatum Game to examine how reward and emotional PEs affect punishment decisions in terms of rejecting unfair offers. Our results revealed that reward relative to emotional PEs exerted a stronger prediction to punishment decisions.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!