Reinforcement learning has recently been studied in various fields and also used to optimally control IoT devices supporting the expansion of Internet connection beyond the usual standard devices. In this paper, we try to allow multiple reinforcement learning agents to learn optimal control policy on their own IoT devices of the same type but with slightly different dynamics. For such multiple IoT devices, there is no guarantee that an agent who interacts only with one IoT device and learns the optimal control policy will also control another IoT device well. Therefore, we may need to apply independent reinforcement learning to each IoT device individually, which requires a costly or time-consuming effort. To solve this problem, we propose a new federated reinforcement learning architecture where each agent working on its independent IoT device shares their learning experience (i.e., the gradient of loss function) with each other, and transfers a mature policy model parameters into other agents. They accelerate its learning process by using mature parameters. We incorporate the actor-critic proximal policy optimization (Actor-Critic PPO) algorithm into each agent in the proposed collaborative architecture and propose an efficient procedure for the gradient sharing and the model transfer. Using multiple rotary inverted pendulum devices interconnected via a network switch, we demonstrate that the proposed federated reinforcement learning scheme can effectively facilitate the learning process for multiple IoT devices and that the learning speed can be faster if more agents are involved.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7085801 | PMC |
http://dx.doi.org/10.3390/s20051359 | DOI Listing |
ACS Chem Neurosci
January 2025
Departments of Psychiatry and Neurology, Division of Molecular Therapeutics, New York State Psychiatric Institute, Columbia University Medical Center, New York, New York 10032, United States.
Voluntary movement, motivation, and reinforcement learning depend on the activity of ventral midbrain neurons, which extend axons to release dopamine (DA) in the striatum. These neurons exhibit two patterns of action potential activity: low-frequency tonic activity that is intrinsically generated and superimposed high-frequency phasic bursts that are driven by synaptic inputs. acute striatal brain preparations are widely employed to study the regulation of evoked DA release but exhibit very different DA release kinetics than recordings.
View Article and Find Full Text PDFWomens Health (Lond)
January 2025
College of Nursing, University of Utah, Salt Lake City, UT, USA.
Background: Postpartum is a critical period to interrupt weight gain across the lifespan, decrease weight-related risk in future pregnancies, promote healthy behaviors that are often adopted during pregnancy, and improve long-term health. Because the postpartum period is marked by unique challenges to a person's ability to prioritize healthy behaviors, a multi-level/domain approach to intervention beyond the individual-level factors of diet and activity is needed.
Objectives: The purpose of this study was to understand postpartum people's perceptions about the relationship between their social networks and support, and their health behaviors and weight.
Sensors (Basel)
January 2025
Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases, 9, Ciudad Universitaria, 28040 Madrid, Spain.
Conducting penetration testing (pentesting) in cybersecurity is a crucial turning point for identifying vulnerabilities within the framework of Information Technology (IT), where real malicious offensive behavior is simulated to identify potential weaknesses and strengthen preventive controls. Given the complexity of the tests, time constraints, and the specialized level of expertise required for pentesting, analysis and exploitation tools are commonly used. Although useful, these tools often introduce uncertainty in findings, resulting in high rates of false positives.
View Article and Find Full Text PDFSensors (Basel)
January 2025
Key Laboratory of Automotive Power Train and Electronics, Hubei University of Automotive Technology, Shiyan 442002, China.
Autonomous driving has demonstrated impressive driving capabilities, with behavior decision-making playing a crucial role as a bridge between perception and control. Imitation Learning (IL) and Reinforcement Learning (RL) have introduced innovative approaches to behavior decision-making in autonomous driving, but challenges remain. On one hand, RL's policy networks often lack sufficient reasoning ability to make optimal decisions in highly complex and stochastic environments.
View Article and Find Full Text PDFSensors (Basel)
December 2024
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China.
Unmanned aerial vehicles (UAVs) furnished with computational servers enable user equipment (UE) to offload complex computational tasks, thereby addressing the limitations of edge computing in remote or resource-constrained environments. The application of value decomposition algorithms for UAV trajectory planning has drawn considerable research attention. However, existing value decomposition algorithms commonly encounter obstacles in effectively associating local observations with the global state of UAV clusters, which hinders their task-solving capabilities and gives rise to reduced task completion rates and prolonged convergence times.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!