In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficient when the parametric family contains the data-generating distribution. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces the kernel density estimator with the expected posterior density using a random histogram prior; the other transforms the posterior over densities into a posterior over parameters through minimizing the Hellinger distance for each density. We show that it is possible to adapt the mathematical machinery of efficient influence functions from semiparametric models to demonstrate that both our estimators are efficient in the sense of achieving the Cramér-Rao lower bound. We further demonstrate a Bernstein-von-Mises result for our second estimator, indicating that its posterior is asymptotically Gaussian. In addition, the robustness properties of classical minimum Hellinger distance estimators continue to hold.
Download full-text PDF |
Source |
---|---|
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512539 | PMC |
http://dx.doi.org/10.3390/e20120955 | DOI Listing |
Entropy (Basel)
December 2024
Department of Communication and Space Technologies, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia.
A method for evaluating Kullback-Leibler (KL) divergence and Squared Hellinger (SH) distance between empirical data and a model distribution is proposed. This method exclusively utilises the empirical Cumulative Distribution Function (CDF) of the data and the CDF of the model, avoiding data processing such as histogram binning. The proposed method converges almost surely, with the proof based on the use of exponentially distributed waiting times.
View Article and Find Full Text PDFStud Health Technol Inform
November 2024
Bern University of Applied Sciences, Switzerland.
Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data.
View Article and Find Full Text PDFEntropy (Basel)
September 2024
Department of Computer and Network Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan.
Two typical fixed-length random number generation problems in information theory are considered for sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given approximation measure is one of our main concerns and has been characterized using two different information quantities: the information spectrum and the smooth Rényi entropy.
View Article and Find Full Text PDFJ Safety Res
September 2024
Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe St, Baltimore, MD 21205, USA; Center for Injury Research and Policy, Johns Hopkins Bloomberg School of Public Health, 624 N. Broadway, Baltimore, MD 21205, USA.
Introduction: This study addresses the lack of methods to quantify driver familiarity with roadways, which poses a higher risk of crashes.
Method: We present a new approach to assessing driving route diversity and familiarity using data from the DrivingApp, a smartphone-based research tool that collects trip-level information, including driving exposure and global positioning system (GPS) data, from young novice drivers (15-19 years old) to older drivers (67-78 years old). Using these data, we developed a GPS data-based algorithm to analyze the uniqueness of driving routes.
Sci Rep
August 2024
Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea.
This study addresses challenges related to privacy issues in utilizing medical data, particularly the protection of personal information. To overcome this obstacle, the research focuses on data synthesis using real-world time-series generative adversarial networks (RTSGAN). A total of 53,005 data were synthesized using the dataset of 15,799 patients with colorectal cancer.
View Article and Find Full Text PDFEnter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!