In recent years, dynamic programming and reinforcement learning theory have been widely used to solve the nonlinear control system (NCS). Among them, many achievements have been made in the construction of network model and system stability analysis, but there is little research on establishing control strategy based on the detailed requirements of control process. Spurred by this trend, this paper proposes a detail-reward mechanism (DRM) by constructing the reward function composed of the individual detail evaluation functions in order to replace the utility function in the Hamilton-Jacobi-Bellman (HJB) equation. And this method is introduced into a wider range of deep reinforcement learning algorithms to solve optimization problems in NCS. After the mathematical description of the relevant characteristics of NCS, the stability of iterative control law is proved by Lyapunov function. With the inverted pendulum system as the experiment object, the dynamic environment is designed and the reward function is established by using the DRM. Finally, three deep reinforcement learning algorithm models are designed in the dynamic environment, which are based on Deep Q-Networks, policy gradient and actor-critic. The effects of different reward functions on the experimental accuracy are compared. The experimental results show that in NCS, using the DRM to replace the utility function in the HJB equation is more in line with the detailed requirements of the designer for the whole control process. By observing the characteristics of the system, designing the reward function and selecting the appropriate deep reinforcement learning algorithm model, the optimization problem of NCS can be solved.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.3934/mbe.2022430 | DOI Listing |
Front Psychol
January 2025
Sorbonne University, CNRS, INSERM, Institute of Biology Paris Seine, Neurosciences Paris Seine, Paris, France.
Transitive inference, the ability to establish hierarchical relationships between stimuli, is typically tested by training with premise pairs (e.g., A + B-, B + C-, C + D-, D + E-), which establishes a stimulus hierarchy (A > B > C > D > E).
View Article and Find Full Text PDFHeliyon
January 2025
Centre for Tactile Internet with Human-in-the-Loop (CeTI), 6G Life, Technische Universität Dresden, Germany.
Recent research has highlighted a notable confidence bias in the haptic sense, yet its impact on learning relative to other senses remains unexplored. This online study investigated learning behaviour across visual, auditory, and haptic modalities using a probabilistic selection task on computers and mobile devices, employing dynamic and ecologically valid stimuli to enhance generalisability. We analysed reaction time as an indicator of confidence, alongside learning speed and task accuracy.
View Article and Find Full Text PDFFront Public Health
January 2025
Dipartimento di Scienze Cognitive, Psicologiche, Pedagogiche e Degli Studi Culturali, Università di Messina, Messina, Italy.
Religious beliefs can shape how people process fear. Yet the psychophysiological mechanisms underlying this phenomenon remain poorly understood. We investigated fear learning and extinction processes in a group of individuals who professed a belief in God, compared to non-believers.
View Article and Find Full Text PDFCommun Biol
January 2025
Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
Those with diabetes mellitus are at high-risk of developing psychiatric disorders, especially mood disorders, yet the link between hyperglycemia and altered motivation has not been thoroughly explored. Here, we characterized value-based decision-making behavior of a streptozocin-induced diabetic mouse model on Restaurant Row, a naturalistic neuroeconomic foraging paradigm capable of behaviorally capturing multiple decision systems known to depend on dissociable neural circuits. Mice made self-paced choices on a daily limited time-budget, accepting or rejecting reward offers based on cost (delays cued by tone pitch) and subjective value (flavors), in a closed-economy system tested across months.
View Article and Find Full Text PDFCommun Biol
January 2025
Department of Psychology, The University of Hong Kong, Hong Kong, China.
Traditional decision-making models conceptualize humans as adaptive learners utilizing the differences between expected and actual rewards (prediction errors, PEs) to maximize outcomes, but rarely consider the influence of violations of emotional expectations (emotional PEs) and how it differs from reward PEs. Here, we conducted a fMRI experiment (n = 43) using a modified Ultimatum Game to examine how reward and emotional PEs affect punishment decisions in terms of rejecting unfair offers. Our results revealed that reward relative to emotional PEs exerted a stronger prediction to punishment decisions.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!