Outlying observations and missing values: how should they be handled?

Clin Exp Pharmacol Physiol

Department of Surgery, The University of Melbourne, Melbourne, Victoria, Australia.

Published: May 2008

1. The problems of, and best solutions for, outlying observations and missing values are very dependent on the sizes of the experimental groups. For original articles published in Clinical and Experimental Pharmacology and Physiology during 2006-2007, the range of group sizes ranged from three to 44 ('small groups'). In surveys, epidemiological studies and clinical trials, the group sizes range from 100s to 1000s ('large groups'). 2. How can one detect outlying (extreme) observations? The best methods are graphical, for instance: (i) a scatterplot, often with mean+/-2 s; and (ii) a box-and-whisker plot. Even with these, it is a matter of judgement whether observations are truly outlying. 3. It is permissable to delete or replace outlying observations if an independent explanation for them can be found. This may be, for instance, failure of a piece of measuring equipment or human error in operating it. If the observation is deleted, it can then be treated as a missing value. Rarely, the appropriate portion of the study can be repeated. 4. It is decidedly not permissable to delete unexplained extreme values. Some of the acceptable strategies for handling them are: (i) transform the data and proceed with conventional statistical analyses; (ii) use the mean for location, but use permutation (randomization) tests for comparing means; and (iii) use robust methods for describing location (e.g. median, geometric mean, trimmed mean), for indicating dispersion (range, percentiles), for comparing locations and for regression analysis. 5. What can be done about missing values? Some strategies are: (i) ignore them; (ii) replace them by hand if the data set is small; and (iii) use computerized imputation techniques to replace them if the data set is large (e.g. regression or EM (conditional Expectation, Maximum likelihood estimation) methods). 6. If the missing values are ignored, or even if they are replaced, it is essential to test whether the individuals with missing values are otherwise indistinguishable from the remainder of the group. If the missing values have not occurred at random, but are associated with some property of the individuals being studied, the subsequent analysis may be biased.

Download full-text PDF

Source
http://dx.doi.org/10.1111/j.1440-1681.2007.04860.xDOI Listing

Publication Analysis

Top Keywords

missing values
20
outlying observations
12
observations missing
8
group sizes
8
permissable delete
8
data set
8
missing
7
values
6
outlying
5
values handled?
4

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!