Well, I’m not going to talk about who won or who should have won. That’s all been decided.
I’m going to focus on the players’ heights. The NYT blog Freakonomics posted about soccer players’ propensity to lie about their heights, focusing on the data in the chart to the right. It shows the distribution of World Cup players by height in centimeters. (Click on it if you want to see a larger version.)
I guess it’s important to know that heights are normally distributed – they follow the bell curve. Not many people are really short or really tall and most people are in the broad middle. But that’s not what this chart shows.
Here, you have a bell-shaped curve, more or less, but it’s not nearly as smooth as you would expect. Freakonomics focuses on the extraordinarily small number of players listed at 179 centimeters (about 5′ 10″), which I’ve highlighted in red. This basically tells us that some people who say they are 180 centimeters are really 179, a hypothesis strengthened by the abnormally large number of people reporting their height as 180 cm.
Freakonomics suggests players exaggerate their height to psych out an opponent, but sports isn’t the only place people lie about their height.
More specifically, let’s talk about (via Chart Porn) OkTrends, an offshoot of the OkCupid, an internet dating site. OkCupid is run by a couple of data geeks who use the site’s database to show how shallow and dishonest their subscribers are. And their most recent example talks about height.
This chart shows male OkCupid members exaggerate their height by about two inches, on average. The bell curve on the left is the distribution of the heights of all men in the United States. The curve on the right is the average for OkCupid men.
Two things to notice here. First, the shift to the right means that OkCupid men are exaggerating their height. Second, the flattening out at the top of the graph indicates that the closer one gets to six feet tall, a Holy Grail of sorts, the more likely one is to exaggerate.
(Women exaggerate height as well, but – duh – don’t lie their way up to six feet.)
So who cares?
The soccer data set interests me because it exhibits a phenomenon common to insurance claims – clustering. Clustering occurs when a distribution of data looks off kilter because the data tend to gather at certain break points.
Claims tend to settle for round amounts – say $5,000, $10,000, $1 million, etc. So you get a lot of claims settling at $5,000, but very few at $4,999 or $5,001.
So your distribution curve looks jagged, like the soccer chart. That makes it hard to model. So you need to do some smoothing.
I grouped the data in two-, three- and five-centimeter ranges, but I’m showing only the five-centimeter grouping. Five centimeters is significant, because the OkCupid geeks suggest that exaggerators tack up to two inches onto their height, but rarely more. One centimeter is about 0.4 inches, so five centimeters is about two inches.
You get the chart at right – sure looks like a bell curve to me. So I’d conclude that in the metric system, people routinely list their heights rounded to the next highest five centimeters.
At this point I’d only add that the dataset is skewed somewhat by the presence of American players, whose height is measured in inches. They will tend to fudge their way up to 183 centimeters (6 feet).