Clustered R-squared Heat-Maps in R

Wednesday July 24, 2013

Sometimes you want to quickly see how much variables are related to one another (linearly, here). You might be thinking about doing factor analysis or some such thing. You'd like to see if the variables can be neatly separated into various groups, perhaps. Here's a function that I wrote for this purpose:

clusterRsquared <- function(dataframe) {
  dissimilarity <- 1 - cor(dataframe)^2
  clustering <- hclust(as.dist(dissimilarity))
  order <- clustering$order
  oldpar <- par(no.readonly=TRUE); par(mar=c(0,0,0,0))
  image(dissimilarity[order, rev(order)], axes=FALSE)
  par(oldpar)
  return(1 - dissimilarity[order, order])
}

Call it like this, for example:

round(clusterRsquared(mtcars),2)

You'll get output like this:

     drat   am gear  mpg   wt   hp  cyl disp carb qsec   vs
drat 1.00 0.51 0.49 0.46 0.51 0.20 0.49 0.50 0.01 0.01 0.19
am   0.51 1.00 0.63 0.36 0.48 0.06 0.27 0.35 0.00 0.05 0.03
gear 0.49 0.63 1.00 0.23 0.34 0.02 0.24 0.31 0.08 0.05 0.04
mpg  0.46 0.36 0.23 1.00 0.75 0.60 0.73 0.72 0.30 0.18 0.44
wt   0.51 0.48 0.34 0.75 1.00 0.43 0.61 0.79 0.18 0.03 0.31
hp   0.20 0.06 0.02 0.60 0.43 1.00 0.69 0.63 0.56 0.50 0.52
cyl  0.49 0.27 0.24 0.73 0.61 0.69 1.00 0.81 0.28 0.35 0.66
disp 0.50 0.35 0.31 0.72 0.79 0.63 0.81 1.00 0.16 0.19 0.50
carb 0.01 0.00 0.08 0.30 0.18 0.56 0.28 0.16 1.00 0.43 0.32
qsec 0.01 0.05 0.05 0.18 0.03 0.50 0.35 0.19 0.43 1.00 0.55
vs   0.19 0.03 0.04 0.44 0.31 0.52 0.66 0.50 0.32 0.55 1.00

And a plot like this, which is much easier to start investigating visually:

Is this necessarily the One True Clustering? No, but it isn't terribly bad either.

This post was originally hosted elsewhere.