PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison.

Hamish Flynn David Reeb Melih Kandemir Jan Peters

IEEE Trans Pattern Anal Mach Intell

Published: December 2023

PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes bounds for bandit problems and an experimental comparison of these bounds. On the one hand, we found that PAC-Bayes bounds are a useful tool for designing offline bandit algorithms with performance guarantees. In our experiments, a PAC-Bayesian offline contextual bandit algorithm was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees. On the other hand, the PAC-Bayesian online bandit algorithms that we tested had loose cumulative regret bounds. We conclude by discussing some topics for future work on PAC-Bayesian bandit algorithms.

Download full-text PDF	Source
http://dx.doi.org/10.1109/TPAMI.2023.3305381	DOI Listing

Publication Analysis

Top Keywords

bandit problems

performance guarantees

pac-bayes bounds

bandit algorithms

bandit

bounds bandit

experimental comparison

pac-bayes

problems

algorithms

Similar Publications

Enhancing lane detection in autonomous vehicles with multi-armed bandit ensemble learning.

Sci Rep

January 2025

School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India.

J Arun Pandian Ramkumar Thirunavukarasu L Thanga Mariappan

This study introduces a novel ensemble learning technique namely Multi-Armed Bandit Ensemble (MAB-Ensemble), designed for lane detection in road images intended for autonomous vehicles. The foundation of the proposed MAB-Ensemble technique is inspired in terms of Multi-Armed bandit optimization to facilitate efficient model selection for lane segmentation. The benchmarking dataset namely TuSimple is used for training, validating and testing the proposed and existing lane detection techniques.

View Article and Find Full Text PDF

Similar Publications

Thompson Sampling for Non-Stationary Bandit Problems.

Entropy (Basel)

January 2025

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Han Qi Fei Guo Li Zhu

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors.

View Article and Find Full Text PDF

Similar Publications

A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model.

Entropy (Basel)

January 2025

School of Electronic and Information, Northwestern Polytechnical University, Xi'an 710129, China.

Yinglong Dang Xiaoguang Gao Zidong Wang

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality.

View Article and Find Full Text PDF

Similar Publications

Multifaceted confidence in exploratory choice.

PLoS One

January 2025

Department of Computational Neuroscience, Max Planck Institute for Biological Cybernetics, Tübingen, Germany.

Oleg Solopchuk Peter Dayan

Our choices are typically accompanied by a feeling of confidence-an internal estimate that they are correct. Correctness, however, depends on our goals. For example, exploration-exploitation problems entail a tension between short- and long-term goals: finding out about the value of one option could mean foregoing another option that is apparently more rewarding.

View Article and Find Full Text PDF

Similar Publications

Exploiting full-duplex opportunities in WLANs via a reinforcement learning-based medium access control protocol.

Sci Rep

December 2024

National University of Defense Technology, Changsha, Hunan, China.

Song Liu Peng Wei

In-band full-duplex communication has the potential to double the wireless channel capacity. However, how to efficiently transform the full-duplex gain at the physical layer into network throughput improvement is still a challenge, especially in dynamic communication environments. This paper presents a reinforcement learning-based full-duplex (RLFD) medium access control (MAC) protocol for wireless local-area networks (WLANs) with full-duplex access points.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!