To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets.

BMC Med Res Methodol

Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria.

Published: September 2021

Background: For finite samples with binary outcomes penalized logistic regression such as ridge logistic regression has the potential of achieving smaller mean squared errors (MSE) of coefficients and predictions than maximum likelihood estimation. There is evidence, however, that ridge logistic regression can result in highly variable calibration slopes in small or sparse data situations.

Methods: In this paper, we elaborate this issue further by performing a comprehensive simulation study, investigating the performance of ridge logistic regression in terms of coefficients and predictions and comparing it to Firth's correction that has been shown to perform well in low-dimensional settings. In addition to tuned ridge regression where the penalty strength is estimated from the data by minimizing some measure of the out-of-sample prediction error or information criterion, we also considered ridge regression with pre-specified degree of shrinkage. We included 'oracle' models in the simulation study in which the complexity parameter was chosen based on the true event probabilities (prediction oracle) or regression coefficients (explanation oracle) to demonstrate the capability of ridge regression if truth was known.

Results: Performance of ridge regression strongly depends on the choice of complexity parameter. As shown in our simulation and illustrated by a data example, values optimized in small or sparse datasets are negatively correlated with optimal values and suffer from substantial variability which translates into large MSE of coefficients and large variability of calibration slopes. In contrast, in our simulations pre-specifying the degree of shrinkage prior to fitting led to accurate coefficients and predictions even in non-ideal settings such as encountered in the context of rare outcomes or sparse predictors.

Conclusions: Applying tuned ridge regression in small or sparse datasets is problematic as it results in unstable coefficients and predictions. In contrast, determining the degree of shrinkage according to some meaningful prior assumptions about true effects has the potential to reduce bias and stabilize the estimates.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8482588PMC
http://dx.doi.org/10.1186/s12874-021-01374-yDOI Listing

Publication Analysis

Top Keywords

logistic regression
20
ridge regression
20
ridge logistic
16
small sparse
16
coefficients predictions
16
sparse datasets
12
degree shrinkage
12
regression
11
ridge
9
regression small
8

Similar Publications

Background: Randomised controlled trials (RCTs) evaluating new systemic treatments for atopic dermatitis (AD) have increased dramatically over the last decade. These trials often incorporate topical therapies either as permitted concomitant or rescue treatments. Differential use of these topicals post-randomisation introduces potential bias as they may nullify or exaggerate treatment responses.

View Article and Find Full Text PDF

This study aimed to explore the awareness, willingness, and engagement with pre-exposure prophylaxis (PrEP) among high-risk Chinese men who have sex with men (MSM) and to investigate the factors influencing its use. A cross-sectional survey of 1800 HIV-negative MSM was conducted in Chengdu, Suzhou, and Wuhan between June 2022 and February 2023 through in-person and online recruitment methods. Univariate and multivariate logistic regression analyses were used to identify predictors of PrEP use.

View Article and Find Full Text PDF

Purpose: Women with gestational diabetes (GDM) have increased risk of hypertensive disorders in pregnancy (HDP). However, knowledge remains limited for women with high-risk metabolic profiles, regardless of GDM diagnosis. This study aimed to evaluate the prevalence of HDP among women at high risk for GDM, while simultaneously identifying potential predictive clinical risk factors of HDP.

View Article and Find Full Text PDF

Epstein-Barr virus (EBV)-related hemophagocytic lymphohistiocytosis (EBV-HLH) and infectious mononucleosis (IM) are characterized by fever, hepatomegaly, and splenomegaly, but HLH has a 50% lethality rate. Therefore, this study aimed to compare the laboratory findings in differentiating EBV-HLH children from IM children who have fever, hepatomegaly, or splenomegaly. A total of 131 IM patients and 29 EBV-HLH pediatric patients with fever, hepatomegaly, or splenomegaly were enrolled in our study.

View Article and Find Full Text PDF

Lipid Levels and Lung Cancer Risk: Findings from the Taiwan National Data Systems from 2012 to 2018.

J Epidemiol Glob Health

January 2025

Department of Internal Medicine, National Taiwan University Hospital and College of Medicine, National Taiwan University, No.7, Chung Shan S. Rd., Zhongzheng District, Taipei City, 100225, Taiwan.

Background: Lipids are known to be involved in carcinogenesis, but the associations between lipid profiles and different lung cancer histological classifications remain unknown.

Methods: Individuals who participated in national adult health surveillance from 2012 to 2018 were included. For patients who developed lung cancer during follow-up, a 1:2 control group of nonlung cancer participants was selected after matching.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!