Asymptotic Properties for Methods Combining the Minimum Hellinger Distance Estimate and the Bayesian Nonparametric Density Estimate.

Entropy (Basel)

Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.

Published: December 2018

In frequentist inference, minimizing the Hellinger distance between a kernel density estimate and a parametric family produces estimators that are both robust to outliers and statistically efficient when the parametric family contains the data-generating distribution. This paper seeks to extend these results to the use of nonparametric Bayesian density estimators within disparity methods. We propose two estimators: one replaces the kernel density estimator with the expected posterior density using a random histogram prior; the other transforms the posterior over densities into a posterior over parameters through minimizing the Hellinger distance for each density. We show that it is possible to adapt the mathematical machinery of efficient influence functions from semiparametric models to demonstrate that both our estimators are efficient in the sense of achieving the Cramér-Rao lower bound. We further demonstrate a Bernstein-von-Mises result for our second estimator, indicating that its posterior is asymptotically Gaussian. In addition, the robustness properties of classical minimum Hellinger distance estimators continue to hold.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7512539PMC
http://dx.doi.org/10.3390/e20120955DOI Listing

Publication Analysis

Top Keywords

hellinger distance
16
minimum hellinger
8
density estimate
8
minimizing hellinger
8
kernel density
8
parametric family
8
density
6
estimators
5
asymptotic properties
4
properties methods
4

Similar Publications

Semi-Empirical Approach to Evaluating Model Fit for Sea Clutter Returns: Focusing on Future Measurements in the Adriatic Sea.

Entropy (Basel)

December 2024

Department of Communication and Space Technologies, Faculty of Electrical Engineering and Computing, University of Zagreb, 10000 Zagreb, Croatia.

A method for evaluating Kullback-Leibler (KL) divergence and Squared Hellinger (SH) distance between empirical data and a model distribution is proposed. This method exclusively utilises the empirical Cumulative Distribution Function (CDF) of the data and the CDF of the model, avoiding data processing such as histogram binning. The proposed method converges almost surely, with the proof based on the use of exponentially distributed waiting times.

View Article and Find Full Text PDF

Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data.

View Article and Find Full Text PDF

Two typical fixed-length random number generation problems in information theory are considered for sources. One is the source resolvability problem and the other is the intrinsic randomness problem. In each of these problems, the optimum achievable rate with respect to the given approximation measure is one of our main concerns and has been characterized using two different information quantities: the information spectrum and the smooth Rényi entropy.

View Article and Find Full Text PDF

Development of an algorithm for analysis of routes: Case studies using novice and older drivers.

J Safety Res

September 2024

Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, 615 N Wolfe St, Baltimore, MD 21205, USA; Center for Injury Research and Policy, Johns Hopkins Bloomberg School of Public Health, 624 N. Broadway, Baltimore, MD 21205, USA.

Introduction: This study addresses the lack of methods to quantify driver familiarity with roadways, which poses a higher risk of crashes.

Method: We present a new approach to assessing driving route diversity and familiarity using data from the DrivingApp, a smartphone-based research tool that collects trip-level information, including driving exposure and global positioning system (GPS) data, from young novice drivers (15-19 years old) to older drivers (67-78 years old). Using these data, we developed a GPS data-based algorithm to analyze the uniqueness of driving routes.

View Article and Find Full Text PDF

Synthesis and quality assessment of combined time-series and static medical data using a real-world time-series generative adversarial network.

Sci Rep

August 2024

Department of Digital Health, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University, Seoul, Republic of Korea.

This study addresses challenges related to privacy issues in utilizing medical data, particularly the protection of personal information. To overcome this obstacle, the research focuses on data synthesis using real-world time-series generative adversarial networks (RTSGAN). A total of 53,005 data were synthesized using the dataset of 15,799 patients with colorectal cancer.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!