A sensible characterization of mode, median, and mean

Tuesday September 17, 2013

Often "types of data" are introduced all together, and then "measures of central tendency" are introduced all together. For "types of data" I mean nominal, ordinal, and numeric (leaving aside interval vs. ratio). For "measures of central tendency" I mean mode, median, and mean.

A common response to this exposition, even if median is justified with reference to skew, is that mode is a stupid thing and its inclusion in the list is almost insulting.

A much nicer exposition would introduce each type of data together with the "measure of central tendency" that is in some sense the best you can do for that type of data.

With nominal data the best you can do is frequency counts. Mode reports the most common thing. The mode of an election is (often) the winner. This is useful.

With ordinal data you can do better by putting everything in order. Now even if there are 8 A's, 6 B's, and 7 C's, still B is more representative for its middle-ness.

Finally with numeric data you can take the mean, and you may want to or you may not want to, but people often do and they may well be right to.

Described this way, the types of data and the ways of measuring them have a pleasant pattern - an organized relationship that gives every part more meaning.

Intro statistics textbook authors, you may update your texts! (Do any already point this out? I don't recall ever seeing it written.) (Also, I don't particularly like the term "measure of central tendency" so I keep it in scare quotes throughout.)

This post was originally hosted elsewhere.