# Sum of Squares: from Definitional to Computational Formula

Sunday September 27, 2020

This is a very old draft, from perhaps 2014-08-17 or so. I don't really remember why I started the way I did, so I thought I'd keep it around. I dropped the idea of writing this up for a long time and just came back to it now.

In introductory statistics the Sum of Squares $$SS$$, implicitly the sum of the squared deviations from the mean, is taught early on along with variance and standard deviation. Often a "definitional" and "computational" formula are introduced in rapid succession. Especially for folks who aren't terribly comfortable with math notation, this can be confusing. I will explain what's going on.

Here is how you get the Sum of Squares $$SS$$.

You start with a bunch of numbers. Call them $$X_1, X_2, ..., X_N$$. So you have $$N$$ numbers.

You get the mean of these numbers, which we'll call $$\overline{X}$$ (read as "x-bar"). First add up all the numbers. That's $$\sum\limits_{i=1}^{N} X_i$$. This is sometimes written as just $$\sum X$$ but for later clarity let's keep the limits. Then you divide that sum by $$N$$ to get the average, also known as arithmetic mean, $$\overline{X}$$. We'll use this equation later.

$\overline{X}=\frac{\sum\limits_{i=1}^{N} X_i}{N}$

Now that we have the mean, we can go through all our numbers again. For each one, we subtract the mean to get the deviation $$X_i - \overline{X}$$. Deviations may be positive or negative. We square each one that we get, and we add these all up. This is the Sum of Squares $$SS$$. This is the definition of the Sum of Squares. So it's called the Definitional Formula.

$SS=\sum\limits_{i=1}^{N} \left(X_i-\overline{X}\right)^2$

So far everything is very sensible. But then you hear about the Computational Formula for the Sum of Squares.

$SS=\sum\limits_{i=1}^{N} X_i^2-\frac{\left(\sum\limits_{i=1}^{N} X_i\right)^2}{N}$

If you're like me, it isn't immediately clear that these two are equal. The computational formula doesn't even have the mean in it!