Friday, March 13, 2009

Visulization of correlation matrix

  • Color Image
data(mtcars)
fit = lm(mpg ~ ., mtcars)
cor = summary(fit, correlation = TRUE)$correlation
cor2 = t(cor[11:1, ])
colors = c("#A50F15", "#DE2D26", "#FB6A4A", "#FCAE91", "#FEE5D9",
"white", "#EFF3FF", "#BDD7E7", "#6BAED6", "#3182BD", "#08519C")
image(1:11, 1:11, cor2, axes = FALSE, ann = F, col = colors)
text(rep(1:11, 11), rep(1:11, each = 11), round(100 * cor2))
  • Ellipses
library(ellipse)
col = colors[as.vector(apply(corr, 2, rank))]
plotcorr(cor, col = col, mar = rep(0, 4))
  • Taiyun's circles (my method)

circle.cor = function(cor, axes = FALSE, xlab = "",
ylab = "", asp = 1, title = "Taiyun's cor-matrix circles",
...) {
n = nrow(cor)
par(mar = c(0, 0, 2, 0), bg = "white")
plot(c(0, n + 0.8), c(0, n + 0.8), axes = axes, xlab = "",
ylab = "", asp = 1, type = "n")
##add grid
segments(rep(0.5, n + 1), 0.5 + 0:n, rep(n + 0.5, n + 1),
0.5 + 0:n, col = "gray")
segments(0.5 + 0:n, rep(0.5, n + 1), 0.5 + 0:n, rep(n + 0.5,
n), col = "gray")
##define circles' background color.
##black for positive correlation coefficient and white for negative
bg = cor
bg[cor > 0] = "black"
bg[cor <= 0] = "white" ##plot n*n circles using vector language, suggested by Yihui Xie symbols(rep(1:n, each = n), rep(n:1, n), add = TRUE, inches = F, circles = as.vector(sqrt(abs(cor))/2), bg = as.vector(bg)) text(rep(0, n), 1:n, n:1, col = "red") text(1:n, rep(n + 1), 1:n, col = "red") title(title) } ## an example data(mtcars) fit = lm(mpg ~ ., mtcars) cor = summary(fit, correlation = TRUE)$correlation circle.cor(cor)

The circles with black background denote positive correlation coefficient, and the area of circles denotes the absolute value. See more in my Picasa here.

The above three graphs based on the same data. Dear friends, which gives your more information at first galance?

19 comments:

  1. Very nicely done! Good to see you are an R user as well. I wonder if you perhaps saw my post on optimising R here?

    ReplyDelete
  2. Thanks for the compliment:)
    Yes, I happened to see your blog last night, and I really learned a lot from it:)

    ReplyDelete
  3. What I like most about your circle plot for the correlation matrix is its simplicity. Besides, I think you can generalize this plot to the distance matrix, which can be used to demonstrate cluster analysis. Correlation is also a kind of distance.

    ReplyDelete
  4. Good work. The graph is now on the graph gallery

    One possible enhancement would be to somehow illustrate that the the correlation is significantly different to zero (cor.test). Although this would need to take the data as input and not just the correlation matrix.

    ReplyDelete
  5. I'll add to the complements - I've been wondering about good graphical presentations of correlation matrices, but this is the best I've seen.

    One suggested, - at Revolutions they pointed out that the white circles disappear on the white background. Could you simply change the background to "grey50"? I tried it last night and I (at least) like it.

    ReplyDelete
  6. to Bob O'Hara:Thanks for the comment on my blog. I appreciate your wonderful idea! And now I put some different graph of correlation matrix circles on my Picasa:http://picasaweb.google.com/WeiTaiyun/CorrelationMatrixCircles#
    Welcome any comment.

    ReplyDelete
  7. I like the last of the three plots---it's simple yet elegant and most suitable for print (b&w).

    ReplyDelete
  8. Dear Taiyun,

    Nice correlation plots on your Picasa. Do you have the R code for generating these?

    Thanks,
    Ravi.

    ReplyDelete
  9. To Ravi:
    Yes, I have. But I don't know your email and I can't see your website, my email is weitaiyun[at]gmail.com

    ReplyDelete
  10. Hi - It is a well known psychophysical phenomenon that humans cannot perceive area very accurately. You are better off using luminance or hue.

    ReplyDelete
  11. Hi Taiyun,

    Thanks for creating this blog... it gives a vivid pictorial depiction of the correlation matrix.
    Thanks also for sending me the image corresponding to the Boston Housing data set. I plan to use it for an upcoming presentation.

    ReplyDelete
  12. I like the last one, I am trying to use it with my data because I have over 60 variables, but I am getting stuck somewhere... Could you help me out jorge.eco.ramos at gmail dot com ?

    ReplyDelete
  13. Pretty nice article on visualizing correlation matrices. Thanks for contributing.

    ReplyDelete
  14. Thanks! How about having an option allowing to display a correlation matrix similar to http://www.mathworks.com/help/econ/corrplot.html , which shows both the correlation coefficient and and the correlation plot for each cell of the correlation matrix?

    ReplyDelete
    Replies
    1. library(corrplot)
      ?corrplot

      Notice addCoef.col

      Delete
  15. need help?
    Contact me via gaelkim7@gmail

    ReplyDelete