HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION.

Ann Stat

Department of Biostatistics, Harvard University, 655 Huntington Avenue, SPH2, 4th Floor, Boston, Massachusetts 02115, USA.

Published: February 2015

In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4522432PMC
http://dx.doi.org/10.1214/14-AOS1279DOI Listing

Publication Analysis

Top Keywords

hypothesis testing
12
detection boundary
12
design matrix
12
high-dimensional sparse
8
sparse binary
8
binary regression
8
derive detection
8
matrix sparsity
8
signal strength
8
sparsity high
8

Similar Publications

Face stereotypes are prevalent, consequential, yet oftentimes inaccurate. How do false first impressions arise and persist despite counter-evidence? Building on the overgeneralization hypothesis, we propose a domain-general cognitive mechanism: insufficient statistical learning, or Insta-learn. This mechanism posits that humans are quick statistical learners but insufficient samplers.

View Article and Find Full Text PDF

Introduction And Hypothesis: This study aims to develop a postpartum stress urinary incontinence (PPSUI) risk prediction model based on an updated definition of PPSUI, using machine learning algorithms. The goal is to identify the best model for early clinical screening to improve screening accuracy and optimize clinical management strategies.

Methods: This prospective study collected data from 1208 postpartum women, with the dataset randomly divided into training and testing sets (8:2).

View Article and Find Full Text PDF

Background: Small remnants may penetrate the arterial intima more efficiently compared to large triglyceride-rich lipoproteins (TGRL). We tested the hypothesis that the importance of remnant cholesterol for the risk of atherosclerotic cardiovascular disease (ASCVD) may depend on the size of the remnants and TGRL carrying cholesterol.

Methods: The cholesterol content of small remnants and large TGRL were measured in 25 572 individuals from the Copenhagen General Population Study (2003-2015) and in 222 721 individuals from the UK Biobank (2006-2010) using nuclear magnetic resonance spectroscopy.

View Article and Find Full Text PDF

Unlabelled: Many animals contain a species-rich and diverse gut microbiota that likely contributes to several host-supportive services that include diet processing and nutrient provisioning. Loss of microbiome taxa and their associated metabolic functions as result of perturbations may result in loss of microbiome-level services and reduction of metabolic capacity. If metabolic functions are shared by multiple taxa (i.

View Article and Find Full Text PDF

Unlabelled: Adequately powered randomized controlled trials (RCTs) are considered the highest level of evidence in guiding clinical practice. Reports using Bayesian hypothesis-testing to reanalyze RCTs are increasing. One distinct advantage of Bayesian analysis is that we can obtain a range of numerical probabilities that reflect how likely a study intervention is more effective than the alternative after considering both pre-existing available evidence and the alternate hypotheses.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!