Publications by authors named "Chong-Zhi Di"

Functional principal component analysis (FPCA) has been widely used to capture major modes of variation and reduce dimensions in functional data analysis. However, standard FPCA based on the sample covariance estimator does not work well if the data exhibits heavy-tailedness or outliers. To address this challenge, a new robust FPCA approach based on a functional pairwise spatial sign (PASS) operator, termed PASS FPCA, is introduced.

View Article and Find Full Text PDF

Functional data arise frequently in biomedical studies, where it is often of interest to investigate the association between functional predictors and a scalar response variable. While functional linear models (FLM) are widely used to address these questions, hypothesis testing for the functional association in the FLM framework remains challenging. A popular approach to testing the functional effects is through dimension reduction by functional principal component (PC) analysis.

View Article and Find Full Text PDF

Parametric and semiparametric mixture models have been widely used in applications from many areas, and it is often of interest to test homogeneity in these models. However, hypothesis testing is nonstandard due to the fact that several regularity conditions do not hold under the null hypothesis. We consider a semiparametric mixture case-control model, in the sense that the density ratio of two distributions is assumed to be of an exponential form, while the baseline density is unspecified.

View Article and Find Full Text PDF

The development of next-generation sequencing technologies has allowed researchers to study comprehensively the contribution of genetic variation particularly rare variants to complex diseases. To date many sequencing analyses of rare variants have focused on marginal genetic effects and have not explored the potential role environmental factors play in modifying genetic risk. Analysis of gene-environment interaction (GxE) for rare variants poses considerable challenges because of variant rarity and paucity of subjects who carry the variants while being exposed.

View Article and Find Full Text PDF

To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time.

View Article and Find Full Text PDF

We consider likelihood ratio tests (LRT) and their modifications for homogeneity in admixture models. The admixture model is a two-component mixture model, where one component is indexed by an unknown parameter while the parameter value for the other component is known. This model is widely used in genetic linkage analysis under heterogeneity in which the kernel distribution is binomial.

View Article and Find Full Text PDF

We introduce Generalized Multilevel Functional Linear Models (GMFLMs), a novel statistical framework for regression models where exposure has a multilevel functional structure. We show that GMFLMs are, in fact, generalized multilevel mixed models (GLMMs). Thus, GMFLMs can be analyzed using the mixed effects inferential machinery and can be generalized within a well researched statistical framework.

View Article and Find Full Text PDF

Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution.

View Article and Find Full Text PDF

The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis.

View Article and Find Full Text PDF

We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep.

View Article and Find Full Text PDF