A cautionary tale on using imputation methods for inference in matched-pairs design.

Bioinformatics

Faculty of Statistics, Institute of Mathematical Statistics and Applications in Industry, Technical University of Dortmund, Dortmund 44227, Germany.

Published: May 2020

Motivation: Imputation procedures in biomedical fields have turned into statistical practice, since further analyses can be conducted ignoring the former presence of missing values. In particular, non-parametric imputation schemes like the random forest have shown favorable imputation performance compared to the more traditionally used MICE procedure. However, their effect on valid statistical inference has not been analyzed so far. This article closes this gap by investigating their validity for inferring mean differences in incompletely observed pairs while opposing them to a recent approach that only works with the given observations at hand.

Results: Our findings indicate that machine-learning schemes for (multiply) imputing missing values may inflate type I error or result in comparably low power in small-to-moderate matched pairs, even after modifying the test statistics using Rubin's multiple imputation rule. In addition to an extensive simulation study, an illustrative data example from a breast cancer gene study has been considered.

Availability And Implementation: The corresponding R-code can be accessed through the authors and the gene expression data can be downloaded at www.gdac.broadinstitute.org.

Supplementary Information: Supplementary data are available at Bioinformatics online.

Download full-text PDF

Source
http://dx.doi.org/10.1093/bioinformatics/btaa082DOI Listing

Publication Analysis

Top Keywords

missing values
8
imputation
5
cautionary tale
4
tale imputation
4
imputation methods
4
methods inference
4
inference matched-pairs
4
matched-pairs design
4
design motivation
4
motivation imputation
4

Similar Publications

Objectives: For emergency department (ED) patients, lung cancer may be detected early through incidental lung nodules (ILNs) discovered on chest CTs. However, there are significant errors in the communication and follow-up of incidental findings on ED imaging, particularly due to unstructured radiology reports. Natural language processing (NLP) can aid in identifying ILNs requiring follow-up, potentially reducing errors from missed follow-up.

View Article and Find Full Text PDF

Importance: Active surveillance (AS) for patients with prostate cancer (PC) often includes fixed repeat prostate biopsies that do not account for the varying risk of reclassification to significant disease. Given the invasive nature and potential complications of biopsies, a personalized approach is needed to balance the burden of biopsies with the risk of missing disease progression.

Objective: To develop and externally validate a dynamic model that predicts an individual's risk of PC reclassification during AS.

View Article and Find Full Text PDF

Background: Coronary heart disease (CHD) is the leading cause of death among adults in Germany. There is evidence that occupational exposure to particulate matter, noise, psychosocial stressors, shift work and high physical workload are associated with CHD. The aim of this study is to identify occupations that are associated with CHD and to elaborate on occupational exposures associated with CHD by using the job exposure matrix (JEM) BAuA-JEM ETB 2018 in a German study population.

View Article and Find Full Text PDF

Purpose: The incidence of cancer, which is a serious public health concern, is increasing. A predictive analysis driven by machine learning was integrated with haematology parameters to create a method for the simultaneous diagnosis of several malignancies at different stages.

Patients And Methods: We analysed a newly collected dataset from various hospitals in Jordan comprising 19,537 laboratory reports (6,280 cancer and 13,257 noncancer cases).

View Article and Find Full Text PDF

Detection of spp. DNA in gynaecological samples by quantitative real-time polymerase chain reaction (qPCR) is considered to be the reference diagnostic test for female genital schistosomiasis (FGS). However, qPCR needs expensive laboratory procedures and highly trained technicians.

View Article and Find Full Text PDF

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!