Evidence synthesis involves drawing conclusions from trial samples that may differ from the target population of interest, and there is often heterogeneity among trials in sample characteristics, treatment implementation, study design, and assessment of covariates. Stitching together this patchwork of evidence requires subject-matter knowledge, a clearly defined target population, and guidance on how to weigh evidence from different trials. Transportability analysis has provided formal identifiability conditions required to make unbiased causal inference in the target population.
View Article and Find Full Text PDFCausally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but from which covariate information can be obtained. In such analyses, a key practical challenge is the presence of systematically missing data when some trials have collected data on one or more baseline covariates, but other trials have not, such that the covariate information is missing for all participants in the latter. In this article, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis.
View Article and Find Full Text PDFUltrahigh and high dimensional data are common in regression analysis for various fields, such as omics data, finance, and biological engineering. In addition to the problem of dimension, the data might also be contaminated. There are two main types of contamination: outliers and model misspecification.
View Article and Find Full Text PDFTo increase power and minimize bias in statistical analyses, quantitative outcomes are often adjusted for precision and confounding variables using standard regression approaches. The outcome is modeled as a linear function of the precision variables and confounders; however, for many complex phenotypes, the assumptions of the linear regression models are not always met. As an alternative, we used neural networks for the modeling of complex phenotypes and covariate adjustments.
View Article and Find Full Text PDFIn correlated data settings, analysts typically choose between fitting conditional and marginal models, whose parameters come with distinct interpretations, and as such the choice between the two should be made on scientific grounds. For settings where interest lies in marginal-or population-averaged-parameters, the question of how best to estimate those parameters is a statistical one, and analysts have at their disposal two distinct modeling frameworks: generalized estimating equations (GEE) and marginalized multilevel models (MMMs). The two have been contrasted theoretically and in large sample settings, but asymptotic theory provides no guarantees in the small sample settings that are commonplace.
View Article and Find Full Text PDFIn the research on complex diseases, gene expression (GE) data have been extensively used for clustering samples. The clusters so generated can serve as the basis for disease subtype identification, risk stratification, and many other purposes. With the small sample sizes of genetic profiling studies and noisy nature of GE data, clustering analysis results are often unsatisfactory.
View Article and Find Full Text PDF