Statistical Inference with M-Estimators on Adaptively Collected Data.

Kelly W Zhang Lucas Janson Susan A Murphy

Adv Neural Inf Process Syst

Departments of Statistics and Computer Science, Harvard University.

Published: December 2021

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators-which includes estimators based on empirical risk minimization as well as maximum likelihood-on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9232184	PMC

Publication Analysis

Top Keywords

bandit algorithms

data collected

statistical inference

valid confidence

contextual bandit

algorithms

inference m-estimators

m-estimators adaptively

collected

adaptively collected

Similar Publications

Thompson Sampling for Non-Stationary Bandit Problems.

Entropy (Basel)

January 2025

School of Software Engineering, Xi'an Jiaotong University, Xi'an 710049, China.

Han Qi Fei Guo Li Zhu

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors.

View Article and Find Full Text PDF

Similar Publications

A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model.

Entropy (Basel)

January 2025

School of Electronic and Information, Northwestern Polytechnical University, Xi'an 710129, China.

Yinglong Dang Xiaoguang Gao Zidong Wang

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality.

View Article and Find Full Text PDF

Similar Publications

Biologically plausible gated recurrent neural networks for working memory and learning-to-learn.

PLoS One

December 2024

Machine Learning Group, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands.

Alexandra R van den Berg Pieter R Roelfsema Sander M Bohte

The acquisition of knowledge and skills does not occur in isolation but learning experiences amalgamate within and across domains. The process through which learning can accelerate over time is referred to as learning-to-learn or meta-learning. While meta-learning can be implemented in recurrent neural networks, these networks tend to be trained with architectures that are not easily interpretable or mappable to the brain and with learning rules that are biologically implausible.

View Article and Find Full Text PDF

Similar Publications

Exploiting full-duplex opportunities in WLANs via a reinforcement learning-based medium access control protocol.

Sci Rep

December 2024

National University of Defense Technology, Changsha, Hunan, China.

Song Liu Peng Wei

In-band full-duplex communication has the potential to double the wireless channel capacity. However, how to efficiently transform the full-duplex gain at the physical layer into network throughput improvement is still a challenge, especially in dynamic communication environments. This paper presents a reinforcement learning-based full-duplex (RLFD) medium access control (MAC) protocol for wireless local-area networks (WLANs) with full-duplex access points.

View Article and Find Full Text PDF

Similar Publications

Causal contextual bandits with one-shot data integration.

Front Artif Intell

December 2024

Robert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, India.

Chandrasekar Subramanian Balaraman Ravindran

We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs-simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce.

View Article and Find Full Text PDF

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!