Add variance to achieve a given correlation

Saturday November 28, 2020

If you have data x and you want some data y that has a given correlation r with x, multiply x by the sign of r and then add random noise with variance equal to the original variance over the square of the desired correlation, minus the original variance.

You can do this in R, for example:

correlated <- function(x, r) {
  return(x * sign(r) + rnorm(length(x), sd=sqrt(var(x)/r^2 - var(x))))
}

x <- rnorm(100)
cor(x, correlated(x, 0.75))
## [1] 0.7537228

The resulting correlation is only equal to the desired value in expectation; you can tweak a value in the result to make it more exact if you like, or use a more sophisticated method.

The method here works because of the relationship between correlation and coefficient of determination which holds here, and of course the variance sum law.

\[ r^2 = R^2 = 1 - \frac{ \sigma^2_\text{noise} }{ \sigma^2_{x + \text{noise}} } = \frac{ \sigma^2_x }{ \sigma^2_x + \sigma^2_\text{noise} } \]

\[ \sigma^2_\text{noise} = \frac{\sigma^2_x}{r^2} - \sigma^2_x \]

The question of how to do this came up years ago at an old job I had. I don't recall anybody coming up with a solution, back then. Live and learn!

If you're generating all the data yourself, it's easier to use a multi-dimensional random generator. The MASS package's mvrnorm function has an empirical flag which can even make things work out exactly (rather than just in expectation). And if you need more new variables that relate to existing data, you probably want a more sophisticated method.