Secondary datasets are used in healthcare research because of its cost advantages, its convenience, and the size of the datasets. However, missing data can cause problems that are difficult to resolve. This manuscript reviews possible causes for missing data, and how to address them. Many researchers use multiple imputation as a solution, which consists of three phases: (a) the imputation phase, (b) the analysis phase, and (c) the pooling phase. When missing data is caused by a refusal to answer or by insufficient knowledge, multiple imputation works well. However, difficulties arise when there are problems with screening questions. If respondents do not answer a screening question, possible answers could be either "yes" or "no." This paper suggests identifying "yes" responses on the screening question, and setting them aside for use in the analysis. The reasons for this approach are the impossibility of conducting multiple imputation twice, the problem of imputation based on the population after sample weight, and the difficulty of producing logical errors on the estimation in imputation phase. This manuscript uses as an example the techniques used to address missing data from screening questions in a national US dataset. These techniques of multiple imputation using examples from the dataset could be used by researchers in future healthcare research that relies on secondary datasets.

Download full-text PDF

Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069597PMC
http://dx.doi.org/10.1177/00469580221088627DOI Listing

Publication Analysis

Top Keywords

missing data
24
multiple imputation
20
secondary datasets
12
imputation phase
8
screening questions
8
screening question
8
imputation
7
missing
6
data
6
multiple
5

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!