Stochastic multiarmed bandits (stochastic MABs) are a problem of sequential decision-making with noisy rewards, where an agent sequentially chooses actions under unknown reward distributions to minimize cumulative regret. The majority of prior works on stochastic MABs assume that the reward distribution of each action has bounded supports or follows light-tailed distribution, i.e., sub-Gaussian distribution. However, in a variety of decision-making problems, the reward distributions follow a heavy-tailed distribution. In this regard, we consider stochastic MABs with heavy-tailed rewards, whose p th moment is bounded by a constant ν for . First, we provide theoretical analysis on sub-optimality of the existing exploration methods for heavy-tailed rewards where it has been proven that existing exploration methods do not guarantee a minimax optimal regret bound. Second, to achieve the minimax optimality under heavy-tailed rewards, we propose a minimax optimal robust upper confidence bound (MR-UCB) by providing tight confidence bound of a p -robust estimator. Furthermore, we also propose a minimax optimal robust adaptively perturbed exploration (MR-APE) which is a randomized version of MR-UCB. In particular, unlike the existing robust exploration methods, both proposed methods have no dependence on ν . Third, we provide the gap-dependent and independent regret bounds of proposed methods and prove that both methods guarantee the minimax optimal regret bound for a heavy-tailed stochastic MAB problem. The proposed methods are the first algorithm that theoretically guarantees the minimax optimality under heavy-tailed reward settings to the best of our knowledge. Finally, we demonstrate the superiority of the proposed methods in simulation with Pareto and Fréchet noises with respect to regrets.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TNNLS.2022.3203035 | DOI Listing |
Entropy (Basel)
November 2024
National Key Laboratory of Intelligent Tracking and Forecasting for Infectious Diseases (NITFID), School of Statistics and Data Science, Nankai University, Tianjin 300071, China.
Data-driven decision-making has become crucial across various domains. Randomization and re-randomization are standard techniques employed in controlled experiments to estimate causal effects in the presence of numerous pre-treatment covariates. This paper quantifies the worst-case mean squared error of the difference-in-means estimator as a generalized discrepancy of covariates between treatment and control groups.
View Article and Find Full Text PDFHeliyon
June 2024
School of Electronics and Information, Zhengzhou University of Aeronautics, Zhengzhou, 450046, China.
In this paper, on the premise that the prior probability is unknown, a noise enhanced binary hypothesis-testing is investigated under the Minimax criterion for a general nonlinear system. Firstly, for lowering the decision risk, an additive noise is intentionally injected to the input and a decision is made under Minimax criterion based on the noise modified output. Then an optimization problem for minimizing the maximum of Bayesian conditional risk under an equality constraint is formulated via analyzing the relationship between the additive noise and the optimal noise modified Minimax decision rule.
View Article and Find Full Text PDFEntropy (Basel)
November 2024
The Viterbi Faculty of Electrical and Computer Engineering, Technion-Israel Institute of Technology, Technion City, Haifa 3200003, Israel.
We propose two families of asymptotically local minimax lower bounds on parameter estimation performance. The first family of bounds applies to any convex, symmetric loss function that depends solely on the difference between the estimate and the true underlying parameter value (i.e.
View Article and Find Full Text PDFJ Am Stat Assoc
September 2023
Department of Statistics, Texas A&M University.
In a traditional Gaussian graphical model, data homogeneity is routinely assumed with no extra variables affecting the conditional independence. In modern genomic datasets, there is an abundance of auxiliary information, which often gets under-utilized in determining the joint dependency structure. In this article, we consider a Bayesian approach to model undirected graphs underlying heterogeneous multivariate observations with additional assistance from covariates.
View Article and Find Full Text PDFActa Oncol
October 2024
Université catholique de Louvain, Institut de recherche expérimentale et clinique, Molecular Imaging and Radiation Oncology (MIRO) Laboratory, Brussels, Belgium; KULeuven, Department of Oncology, Laboratory of external radiotherapy, Leuven, Belgium; Particle Therapy Interuniversity Center Leuven - PARTICLE, Leuven, Belgium.
Background And Purpose: Intensity Modulated Proton Therapy (IMPT) faces challenges in lung cancer treatment, like maintaining plan robustness for moving tumors against setup, range errors, and interplay effects. Proton Arc Therapy (PAT) is an alternative to maintain target coverage, potentially improving organ at risk (OAR) sparing, reducing beam delivery time (BDT), and enhancing patient experience. We aim to perform a systematic plan comparison study between IMPT and energy layer (EL) and spot assignment algorithm - Proton Arc Therapy (ELSA-PAT) to assess its potential for lung cancer treatment.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!