A modern maximum-likelihood theory for high-dimensional logistic regression.

Proc Natl Acad Sci U S A

Department of Statistics, Stanford University, Stanford, CA 94305;

Published: July 2019

Students in statistics or data science usually learn early on that when the sample size n is large relative to the number of variables p, fitting a logistic model by the method of maximum likelihood produces estimates that are consistent and that there are well-known formulas that quantify the variability of these estimates which are used for the purpose of statistical inference. We are often told that these calculations are approximately valid if we have 5 to 10 observations per unknown parameter. This paper shows that this is far from the case, and consequently, inferences produced by common software packages are often unreliable. Consider a logistic model with independent features in which and become increasingly large in a fixed ratio. We prove that () the maximum-likelihood estimate (MLE) is biased, () the variability of the MLE is far greater than classically estimated, and () the likelihood-ratio test (LRT) is not distributed as a χ The bias of the MLE yields wrong predictions for the probability of a case based on observed values of the covariates. We present a theory, which provides explicit expressions for the asymptotic bias and variance of the MLE and the asymptotic distribution of the LRT. We empirically demonstrate that these results are accurate in finite samples. Our results depend only on a single measure of signal strength, which leads to concrete proposals for obtaining accurate inference in finite samples through the estimate of this measure.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6642380	PMC
http://dx.doi.org/10.1073/pnas.1810420116	DOI Listing

Publication Analysis

Top Keywords

logistic model

finite samples

modern maximum-likelihood

maximum-likelihood theory

theory high-dimensional

high-dimensional logistic

logistic regression

regression students

students statistics

statistics data

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered