Methods for handling missing data in clinical psychology studies are reviewed. Missing data are defined, and a taxonomy of main approaches to analysis is presented, including complete-case and available-case analysis, weighting, maximum likelihood, Bayes, single and multiple imputation, and augmented inverse probability weighting. Missingness mechanisms, which play a key role in the performance of alternative methods, are defined.
View Article and Find Full Text PDFSelection bias is a serious potential problem for inference about relationships of scientific interest based on samples without well-defined probability sampling mechanisms. Motivated by the potential for selection bias in: (a) estimated relationships of polygenic scores (PGSs) with phenotypes in genetic studies of volunteers and (b) estimated differences in subgroup means in surveys of smartphone users, we derive novel measures of selection bias for estimates of the coefficients in linear and probit regression models fitted to nonprobability samples, when aggregate-level auxiliary data are available for the selected sample and the target population. The measures arise from normal pattern-mixture models that allow analysts to examine the sensitivity of their inferences to assumptions about nonignorable selection in these samples.
View Article and Find Full Text PDFImportance: Amyotrophic lateral sclerosis (ALS) has an immune component, but previous human studies have not examined immune changes over time.
Objectives: To assess peripheral inflammatory markers in participants with ALS and healthy control individuals and to track immune changes in ALS and determine whether these changes correlate with disease progression.
Design, Setting, And Participants: In this longitudinal cohort study, leukocytes were isolated from peripheral blood samples from 35 controls and 119 participants with ALS at the ALS Clinic of the University of Michigan, Ann Arbor, from June 18, 2014, through May 26, 2016.
Background: The potential impact of missing data on the results of clinical trials has received heightened attention recently. A National Research Council study provides recommendations for limiting missing data in clinical trial design and conduct, and principles for analysis, including the need for sensitivity analyses to assess robustness of findings to alternative assumptions about the missing data. A Food and Drug Administration advisory committee raised missing data as a serious concern in their review of results from the ATLAS ACS 2 TIMI 51 study, a large clinical trial that assessed rivaroxaban for its ability to reduce the risk of cardiovascular death, myocardial infarction or stroke in patients with acute coronary syndrome.
View Article and Find Full Text PDFA case study is presented assessing the impact of missing data on the analysis of daily diary data from a study evaluating the effect of a drug for the treatment of insomnia. The primary analysis averaged daily diary values for each patient into a weekly variable. Following the commonly used approach, missing daily values within a week were ignored provided there was a minimum number of diary reports (i.
View Article and Find Full Text PDFLifetime Data Anal
July 2015
Missing values in predictors are a common problem in survival analysis. In this paper, we review estimation methods for accelerated failure time models with missing predictors, and apply a new method called subsample ignorable likelihood (IL) Little and Zhang (J R Stat Soc 60:591-605, 2011) to this class of models. The approach applies a likelihood-based method to a subsample of observations that are complete on a subset of the covariates, chosen based on assumptions about the missing data mechanism.
View Article and Find Full Text PDFObjectives: To recommend methodological standards in the prevention and handling of missing data for primary patient-centered outcomes research (PCOR).
Study Design And Setting: We searched National Library of Medicine Bookshelf and Catalog as well as regulatory agencies' and organizations' Web sites in January 2012 for guidance documents that had formal recommendations regarding missing data. We extracted the characteristics of included guidance documents and recommendations.
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA.
View Article and Find Full Text PDFBackground: Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed.
View Article and Find Full Text PDFBiometrics
September 2012
We consider the linear regression of outcome Y on regressors W and Z with some values of W missing, when our main interest is the effect of Z on Y, controlling for W. Three common approaches to regression with missing covariates are (i) complete-case analysis (CC), which discards the incomplete cases, and (ii) ignorable likelihood methods, which base inference on the likelihood based on the observed data, assuming the missing data are missing at random (Rubin, 1976b), and (iii) nonignorable modeling, which posits a joint distribution of the variables and missing data indicators. Another simple practical approach that has not received much theoretical attention is to drop the regressor variables containing missing values from the regression modeling (DV, for drop variables).
View Article and Find Full Text PDFCommunity Dent Oral Epidemiol
October 2011
Objectives: This pragmatic randomized trial evaluated the effectiveness of a tailored educational intervention on oral health behaviors and new untreated carious lesions in low-income African-American children in Detroit, Michigan.
Methods: Participating families were recruited in a longitudinal study of the determinants of dental caries in 1021 randomly selected children (0-5 years) and their caregivers. The families were examined at baseline in 2002-2004 (Wave I), 2004-2005 (Wave II) and 2007 (Wave III).
J R Stat Soc Ser C Appl Stat
May 2010
Data analysis for randomized trials including multi-treatment arms is often complicated by subjects who do not comply with their treatment assignment. We discuss here methods of estimating treatment efficacy for randomized trials involving multi-treatment arms subject to non-compliance. One treatment effect of interest in the presence of non-compliance is the complier average causal effect (CACE) (Angrist et al.
View Article and Find Full Text PDFIn clinical trials, a biomarker (S ) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z ) on T. A useful biomarker can be measured earlier than T and cost less than T. In this article, we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed.
View Article and Find Full Text PDFWe consider the estimation of the regression of an outcome Y on a covariate X, where X is unobserved, but a variable W that measures X with error is observed. A calibration sample that measures pairs of values of X and W is also available; we consider calibration samples where Y is measured (internal calibration) and not measured (external calibration). One common approach for measurement error correction is Regression Calibration (RC), which substitutes the unknown values of X by predictions from the regression of X on W estimated from the calibration sample.
View Article and Find Full Text PDFIn this paper, the authors describe a simple method for making longitudinal comparisons of alternative markers of a subsequent event. The method is based on the aggregate prediction gain from knowing whether or not a marker has occurred at any particular age. An attractive feature of the method is the exact decomposition of the measure into 2 components: 1) discriminatory ability, which is the difference in the mean time to the subsequent event for individuals for whom the marker has and has not occurred, and 2) prevalence factor, which is related to the proportion of individuals who are positive for the marker at a particular age.
View Article and Find Full Text PDFIn longitudinal studies of developmental and disease processes, participants are followed prospectively with intermediate milestones identified as they occur. Frequently, studies enroll participants over a range of ages including ages at which some participants' milestones have already passed. Ages at milestones that occur prior to study entry are left censored if individuals are enrolled in the study or left truncated if they are not.
View Article and Find Full Text PDFJ R Stat Soc Ser C Appl Stat
November 2010
Background: The Internet provides us with tools (user metrics or paradata) to evaluate how users interact with online interventions. Analysis of these paradata can lead to design improvements.
Objective: The objective was to explore the qualities of online participant engagement in an online intervention.