Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods. NVIF compresses the MIS into a compact latent state while adopting neighboring communication. To stabilize the overall training process, we introduce a two-stage training mechanism. We first pretrain the NVIF module using a randomly sampled offline dataset to create a task-agnostic and stable communication protocol, and then use the pretrained protocol to perform online policy training with RL algorithms. Our theoretical analysis indicates that NVIF-proximal policy optimization (PPO), which combines NVIF with PPO, has the potential to promote cooperation with agent-specific rewards. Experiment results demonstrate the superiority of our method in both heterogeneous and homogeneous settings. Additional experiment results also demonstrate the potential of our method for multitask learning.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TNNLS.2023.3309608DOI Listing

Publication Analysis

Top Keywords

neighboring variational
8
variational flow
8
large-scale multiagent
8
multiagent reinforcement
8
reinforcement learning
8
existing methods
8
training process
8
communication protocol
8
experiment demonstrate
8
nvif
5

Similar Publications

Machine learning methods have been important in the study of phase transitions. Unsupervised methods are particularly attractive because they do not require prior knowledge of the existence of a phase transition. In this work we focus on the constant magnetization Ising model in two (2D) and three (3D) dimensions.

View Article and Find Full Text PDF

Wearable technology enables the unsupervised recording of electrocardiogram (ECG) signals. Analyzing these high-dimensional ECG data poses challenges regarding statistical approaches and explainability. This work investigates the feasibility of medically explainable anomaly detection through disentangled representational learning of ECGs and personalization to mitigate inter-subject variations.

View Article and Find Full Text PDF

In this paper, the state estimation problem of physical plants with unknown system dynamic is revisited from the perspective of limited output information measurement, which corresponds to those with characteristics of high-dimensional, wide-area coverage and scatter. Given this fact, a network of sensors are used to carry out the measurement with each one accessing only partial outputs of the targeted systems and a novel model-free state estimation approach, named distributed stochastic variational inference state estimation, is proposed. The key idea of this method is to compensate for the impacts of local output measurements by adding nearest-neighbor rule-based information interaction among estimators to complete the state estimation.

View Article and Find Full Text PDF

Variational Approaches for Drug-Disease-Gene Links in Periodontal Inflammation.

Int Dent J

October 2024

Carlos-M. Ardila. DDS. Periodontist. Ph.D in Epidemiology. Postdoc in Bioethics Titular Professor. Universidad de Antioquia U de A, Medellín, Colombia. Biomedical Stomatology Research Group, Universidad de Antioquia U de A, Medellín, Colombia. Electronic address:

Article Synopsis
  • The study explores the connection between oral diseases like gingivitis and periodontitis and the Wnt signaling pathway, which is essential for bone-related processes.
  • It compares the predictive capabilities of two advanced AI techniques, variational autoencoders (VAEs) and quantum variational classifiers (QVCs), in modeling gene associations relevant to drug treatments for these conditions.
  • Results indicate that both models can effectively identify gene-drug associations linked to the Wnt pathway, offering potential advancements in targeted therapies for periodontal inflammation.
View Article and Find Full Text PDF

Single-cell RNA sequencing (scRNA-seq) is now a successful technology for identifying cell heterogeneity, revealing new cell subpopulations, and predicting developmental trajectories. A crucial component in scRNA-seq is the precise identification of cell subsets. Although many unsupervised clustering methods have been developed for clustering cell subpopulations, the performance of these methods is prone to be affected by dropout, high dimensionality, and technical noise.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!