Cautionary Remarks on the Use of Clusterwise Regression.

Michael J Brusco J Dennis Cradit Douglas Steinley Gavin L Fox

Multivariate Behav Res

Published: January 2016

Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures.

Download full-text PDF	Source
http://dx.doi.org/10.1080/00273170701836653	DOI Listing

Publication Analysis

Top Keywords

clusterwise regression

within-cluster regression

regression models

error explained

explained clustering

potential overfitting

regression

cautionary remarks

clusterwise

remarks clusterwise

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!