PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. On the one hand, we found that PAC-Bayes bounds are a useful tool for designing offline bandit algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline contextual bandit algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online bandit algorithms that we tested had loose cumulative regret bounds. We conclude by discussing some topics for future work on PAC-Bayesian bandit algorithms.

Download full-text PDF

Source
http://dx.doi.org/10.1109/TPAMI.2023.3305381DOI Listing

Publication Analysis

Top Keywords

bandit problems
16
performance guarantees
16
pac-bayes bounds
12
bandit algorithms
12
bandit
8
bounds bandit
8
experimental comparison
8
pac-bayes
5
problems
5
algorithms
5

Similar Publications

This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.

View Article and Find Full Text PDF

Thompson Sampling for Non-Stationary Bandit Problems.

Entropy (Basel)

January 2025

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors.

View Article and Find Full Text PDF

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality.

View Article and Find Full Text PDF

Multifaceted confidence in exploratory choice.

PLoS One

January 2025

Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany.

Our choices are typically accompanied by a feeling of confidence-an internal estimate that they are correct. Correctness, however, depends on our goals. For example, exploration-exploitation problems entail a tension between short- and long-term goals: finding out about the value of one option could mean foregoing another option that is apparently more rewarding.

View Article and Find Full Text PDF

In-band full-duplex communication has the potential to double the wireless channel capacity. However, how to efficiently transform the full-duplex gain at the physical layer into network throughput improvement is still a challenge, especially in dynamic communication environments. This paper presents a reinforcement learning-based full-duplex (RLFD) medium access control (MAC) protocol for wireless local-area networks (WLANs) with full-duplex access points.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!