# Add variance to achieve a given correlation

Saturday November 28, 2020

If you have data *x* and you want some data *y* that has a given
correlation *r* with *x*, multiply *x* by the sign of *r* and then
add random noise with variance equal to the original variance over the
square of the desired correlation, minus the original variance.

You can do this in R, for example:

```
correlated <- function(x, r) {
return(x * sign(r) + rnorm(length(x), sd=sqrt(var(x)/r^2 - var(x))))
}
x <- rnorm(100)
cor(x, correlated(x, 0.75))
## [1] 0.7537228
```

The resulting correlation is only equal to the desired value in expectation; you can tweak a value in the result to make it more exact if you like, or use a more sophisticated method.

The method here works because of the relationship between correlation and coefficient of determination which holds here, and of course the variance sum law.

\[ r^2 = R^2 = 1 - \frac{ \sigma^2_\text{noise} }{ \sigma^2_{x + \text{noise}} } = \frac{ \sigma^2_x }{ \sigma^2_x + \sigma^2_\text{noise} } \]

\[ \sigma^2_\text{noise} = \frac{\sigma^2_x}{r^2} - \sigma^2_x \]

The question of how to do this came up years ago at an old job I had. I don't recall anybody coming up with a solution, back then. Live and learn!

If you're generating all the data yourself, it's easier to use a
multi-dimensional random generator. The MASS package's mvrnorm
function has an `empirical`

flag which can even make things work out
exactly (rather than just in expectation). And if you need more new
variables that relate to existing data, you probably want a
more sophisticated method.