Learning parametric policies and transition probability models of markov decision processes from data.

Eur J Control

Department of Electrical and Computer Engineering, Division of Systems Engineering, and Department of Biomedical Engineering, Boston University, 8 St. Mary's St., Boston, MA 02215.

Published: January 2021

We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944408PMC
http://dx.doi.org/10.1016/j.ejcon.2020.04.003DOI Listing

Publication Analysis

Top Keywords

transition probability
16
markov decision
8
probability model
8
transition probabilities
8
transition
6
policy
5
learning parametric
4
parametric policies
4
policies transition
4
probability
4

Similar Publications

Background: Health behaviors, health, and income change during aging. However, no previous studies have examined, how they develop together over the transition to statutory retirement. We aimed to examine their joint development and to identify the determinants of any distinct trajectories.

View Article and Find Full Text PDF

Objectives: Advanced HIV disease (AHD) at HIV care enrollment is common in Latin America and may bias cross-sectional care continuum estimates. We therefore explored the impact of AHD on HIV care continuum outcomes using a longitudinal approach.

Methods: We analyzed trajectories of 26,174 adult people with HIV enrolled at Caribbean, Central and South America network for HIV epidemiology (CCASAnet) sites (2003-2019) using multi-state Cox regression across five stages: (i) enrolled without antiretroviral therapy (no-ART); (ii) on ART without viral suppression (viral load ≥200 copies/m; ART + non-VS); (iii) on ART with viral suppression (viral load <200 copies/ml; ART + VS); (iv) lost to follow-up; (v) death.

View Article and Find Full Text PDF

BACKGROUND Transitional cell bladder carcinoma (tcBC) is the predominant form of bladder cancer, making up around 95% of reported cases. Prognostic factors for older individuals with tcBC differ from those affecting younger patients. The main purpose of this study was to establish a prognostic competing risk model for elderly patients with tcBC.

View Article and Find Full Text PDF

Arthritis, a chronic inflammatory condition linked to cardiovascular disease (CVD) and bone fracture, is more frequent among military veterans and postmenopausal women. This study examined correlates of arthritis and relationships of arthritis with risks of developing CVD, bone fractures, and mortality among postmenopausal veteran and non-veteran women. We analyzed longitudinal data on 135,790 (3,436 veteran and 132,354 non-veteran) postmenopausal women from the Women's Health Initiative who were followed-up for an average of 16 years between enrollment (1993-1998) and February 17, 2024.

View Article and Find Full Text PDF

"Multimodal Sleep Signal Tensor Decomposition and Hidden Markov Modeling for Temazepam-Induced Anomalies Across Age Groups".

J Neurosci Methods

January 2025

School of Electrical and Computer Engineering, Gallogly College of Engineering, University of Oklahoma, Norman, OK 73019, USA.

Background: Recent advances in multimodal signal analysis enable the identification of subtle drug-induced anomalies in sleep that traditional methods often miss.

New Method: We develop and introduce the Dynamic Representation of Multimodal Activity and Markov States (DREAMS) framework, which embeds explainable artificial intelligence (XAI) techniques to model hidden state transitions during sleep using tensorized EEG, EMG, and EOG signals from 22 subjects across three age groups (18-29, 30-49, and 50-66 years). By combining Tucker decomposition with probabilistic Hidden Markov Modeling, we quantified age-specific, temazepam-induced hidden states and significant differences in transition probabilities.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!