We study a marginal empirical likelihood approach in scenarios when the number of variables grows exponentially with the sample size. The marginal empirical likelihood ratios as functions of the parameters of interest are systematically examined, and we find that the marginal empirical likelihood ratio evaluated at zero can be used to differentiate whether an explanatory variable is contributing to a response variable or not. Based on this finding, we propose a unified feature screening procedure for linear models and the generalized linear models. Different from most existing feature screening approaches that rely on the magnitudes of some marginal estimators to identify true signals, the proposed screening approach is capable of further incorporating the level of uncertainties of such estimators. Such a merit inherits the self-studentization property of the empirical likelihood approach, and extends the insights of existing feature screening methods. Moreover, we show that our screening approach is less restrictive to distributional assumptions, and can be conveniently adapted to be applied in a broad range of scenarios such as models specified using general moment conditions. Our theoretical results and extensive numerical examples by simulations and data analysis demonstrate the merits of the marginal empirical likelihood approach.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3887322PMC
http://dx.doi.org/10.1214/13-AOS1139DOI Listing

Publication Analysis

Top Keywords

empirical likelihood
24
marginal empirical
20
feature screening
16
likelihood approach
12
linear models
8
existing feature
8
screening approach
8
marginal
6
likelihood
6
screening
6

Similar Publications

Objective: To develop a population pharmacokinetic (PK) model to characterize serum pegcetacoplan concentration-time data after intravitreal administration in patients with geographic atrophy (GA) or neovascular age-related macular degeneration (nAMD).

Design: Pharmacokinetic modeling.

Participants: Two hundred sixty-one patients with GA or nAMD enrolled in 4 clinical studies of pegcetacoplan.

View Article and Find Full Text PDF

Background: The clinical phenotypes of myelin oligodendrocyte glycoprotein antibody-associated disease (MOGAD) have been found to overlap with several other diseases. The new criteria proposed in 2023 were designed to better identify the disease but require validation across various populations to ascertain its clinical utility. We aimed to investigate the diagnostic performance in phenotypically diverse patients.

View Article and Find Full Text PDF

Data-driven discovery and parameter estimation of mathematical models in biological pattern formation.

PLoS Comput Biol

January 2025

Department of Anatomy and Cell Biology, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Fukuoka, Japan.

Mathematical modeling has been utilized to explain biological pattern formation, but the selections of models and parameters have been made empirically. In the present study, we propose a data-driven approach to validate the applicability of mathematical models. Specifically, we developed methods to automatically select the appropriate mathematical models based on the patterns of interest and to estimate the model parameters.

View Article and Find Full Text PDF

Currently in wheat breeding, genome wide association studies (GWAS) have successfully revealed the genetic basis of complex traits such as nitrogen use efficiency (NUE) and its biological processes. In the GWAS model, thresholding is common strategy to indicate deviation of expected range of -(s), and it can be used to find the distribution of true positive associations under or over of test statistics. Therefore, the threshold plays a critical role to identify reliable and significant associations in wide genome, while the proportion of false positive results is relatively low.

View Article and Find Full Text PDF

edgeR is an R/Bioconductor software package for differential analyses of sequencing data in the form of read counts for genes or genomic features. Over the past 15 years, edgeR has been a popular choice for statistical analysis of data from sequencing technologies such as RNA-seq or ChIP-seq. edgeR pioneered the use of the negative binomial distribution to model read count data with replicates and the use of generalized linear models to analyze complex experimental designs.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!