We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7944408 | PMC |
http://dx.doi.org/10.1016/j.ejcon.2020.04.003 | DOI Listing |
PLoS One
January 2025
Department of Public Health, University of Helsinki, Helsinki, Finland.
Background: Health behaviors, health, and income change during aging. However, no previous studies have examined, how they develop together over the transition to statutory retirement. We aimed to examine their joint development and to identify the determinants of any distinct trajectories.
View Article and Find Full Text PDFIJID Reg
March 2025
Instituto Nacional de Ciencias Médicas y Nutrición Salvador Zubirán, Mexico City, Mexico.
Objectives: Advanced HIV disease (AHD) at HIV care enrollment is common in Latin America and may bias cross-sectional care continuum estimates. We therefore explored the impact of AHD on HIV care continuum outcomes using a longitudinal approach.
Methods: We analyzed trajectories of 26,174 adult people with HIV enrolled at Caribbean, Central and South America network for HIV epidemiology (CCASAnet) sites (2003-2019) using multi-state Cox regression across five stages: (i) enrolled without antiretroviral therapy (no-ART); (ii) on ART without viral suppression (viral load ≥200 copies/m; ART + non-VS); (iii) on ART with viral suppression (viral load <200 copies/ml; ART + VS); (iv) lost to follow-up; (v) death.
Med Sci Monit
January 2025
Department of Urology, Ningbo Municipal Hospital of Traditional Chinese Medicine (TCM), Affiliated Hospital of Zhejiang Chinese Medical University, Ningbo, Zhejiang, China.
BACKGROUND Transitional cell bladder carcinoma (tcBC) is the predominant form of bladder cancer, making up around 95% of reported cases. Prognostic factors for older individuals with tcBC differ from those affecting younger patients. The main purpose of this study was to establish a prognostic competing risk model for elderly patients with tcBC.
View Article and Find Full Text PDFGeroscience
January 2025
U.S. Department of Veterans Affairs, VA National Center On Homelessness Among Veterans, Washington, DC, USA.
Arthritis, a chronic inflammatory condition linked to cardiovascular disease (CVD) and bone fracture, is more frequent among military veterans and postmenopausal women. This study examined correlates of arthritis and relationships of arthritis with risks of developing CVD, bone fractures, and mortality among postmenopausal veteran and non-veteran women. We analyzed longitudinal data on 135,790 (3,436 veteran and 132,354 non-veteran) postmenopausal women from the Women's Health Initiative who were followed-up for an average of 16 years between enrollment (1993-1998) and February 17, 2024.
View Article and Find Full Text PDFJ Neurosci Methods
January 2025
School of Electrical and Computer Engineering, Gallogly College of Engineering, University of Oklahoma, Norman, OK 73019, USA.
Background: Recent advances in multimodal signal analysis enable the identification of subtle drug-induced anomalies in sleep that traditional methods often miss.
New Method: We develop and introduce the Dynamic Representation of Multimodal Activity and Markov States (DREAMS) framework, which embeds explainable artificial intelligence (XAI) techniques to model hidden state transitions during sleep using tensorized EEG, EMG, and EOG signals from 22 subjects across three age groups (18-29, 30-49, and 50-66 years). By combining Tucker decomposition with probabilistic Hidden Markov Modeling, we quantified age-specific, temazepam-induced hidden states and significant differences in transition probabilities.
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!