Due to the prevalence of complex data, data heterogeneity is often observed in contemporary scientific studies and various applications. Motivated by studies on cancer cell lines, we consider the analysis of heterogeneous subpopulations with binary responses and high-dimensional covariates. In many practical scenarios, it is common to use a single regression model for the entire data set. To do this effectively, it is critical to quantify the heterogeneity of the effect of covariates across subpopulations through appropriate statistical inference. However, the high dimensionality and discrete nature of the data can lead to challenges in inference. Therefore, we propose a novel statistical inference method for a high-dimensional logistic regression model that accounts for heterogeneous subpopulations. Our primary goal is to investigate heterogeneity across subpopulations by testing the equivalence of the effect of a covariate and the significance of the overall effects of a covariate. To achieve overall sparsity of the coefficients and their fusions across subpopulations, we employ a fused group Lasso penalization method. In addition, we develop a statistical inference method that incorporates bias correction of the proposed penalized method. To address computational issues due to the nonlinear log-likelihood and the fused Lasso penalty, we propose a computationally efficient and fast algorithm by adapting the ideas of the proximal gradient method and the alternating direction method of multipliers (ADMM) to our settings. Furthermore, we develop non-asymptotic analyses for the proposed fused group Lasso and prove that the debiased test statistics admit chi-squared approximations even in the presence of high-dimensional variables. In simulations, the proposed test outperforms existing methods. The practical effectiveness of the proposed method is demonstrated by analyzing data from the Cancer Cell Line Encyclopedia (CCLE).

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10713553PMC
http://dx.doi.org/10.1038/s41598-023-48903-xDOI Listing

Publication Analysis

Top Keywords

heterogeneous subpopulations
12
regression model
12
statistical inference
12
high-dimensional logistic
8
logistic regression
8
cancer cell
8
inference method
8
fused group
8
group lasso
8
method
7

Similar Publications

A robust and generalized framework in diabetes classification across heterogeneous environments.

Comput Biol Med

January 2025

School of Information Technology, Deakin University, Melbourne, Victoria, Australia. Electronic address:

Diabetes mellitus (DM) represents a major global health challenge, affecting a diverse range of demographic populations across all age groups. It has particular implications for women during pregnancy and the postpartum period. The contemporary prevalence of sedentary lifestyle patterns and suboptimal dietary practices has substantially contributed to the escalating incidence of this metabolic disorder.

View Article and Find Full Text PDF

A single-cell sequencing-based analysis of a 13-year-old with maxillary sinus NUT carcinoma.

Oral Oncol

January 2025

Clinical Research Center (CRC), Medical Pathology Center (MPC), Cancer Early Detection and Treatment Center (CEDTC) and Translational Medicine Research Center (TMRC), Chongqing University Three Gorges Hospital, Chongqing University, Wanzhou District, Chongqing 404100, China; Chongqing Technical Innovation Center for Quality Evaluation and Identification of Authentic Medicinal Herbs, Wanzhou District, Chongqing 404100, China; School of Medicine Chongqing University, Chongqing University, Shapingba District, Chongqing 400030, China. Electronic address:

NUT carcinoma is a rare and highly aggressive malignancy, predominantly affecting adolescents and young adults. This tumor demonstrates rapid progression, resistance to conventional anti-cancer treatments, and an extremely poor prognosis. Currently, research on NUT carcinoma is limited, and effective treatment options remain scarce.

View Article and Find Full Text PDF

Thymoglobulin is used to prevent allograft rejection and is being explored at low doses as intervention immunotherapy in type 1 diabetes. Thymoglobulin consists of a diverse pool of rabbit antibodies directed against many different targets on human thymocytes that can also be expressed by other leukocytes. Since Thymoglobulin is generated by injecting rabbits with human thymocytes, this conceivably leads to differences between Thymoglobulin batches.

View Article and Find Full Text PDF

: The video head impulse test is a landmark in vestibular diagnostic methods to assess the high-frequency semicircular canal system. This test is well established in the adult population with immense research since its discovery. The usefulness and feasibility of the test in children is not very well defined, as research has been limited.

View Article and Find Full Text PDF

The Role of Omega-3 Polyunsaturated Fatty Acids in Patients with Metabolic Syndrome and Endothelial Dysfunction.

Medicina (Kaunas)

December 2024

Centre of Clnical and Preclinical Research, MEDIPARK-University Research Park, Pavol Jozef Safarik University, Trieda SNP 1, 040 11 Kosice, Slovakia.

Metabolic syndrome (MS) represents several diseases encompassing a heterogeneous group of biochemical and physiological abnormalities characterized by structural and functional alterations in the myocardium, including the endothelium of the coronary arteries. MS also affects a substantial portion of the global population. Understanding the risk factors, the development and treatment associated with MS are of paramount importance for early identification, treatment and prevention.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!