## Number games

Apparently liberal web site Daily Kos thinks its polling firm has bilked it, and it’s relying on a statistical analysis of poll results that appear to show some extraordinarily unlikely patterns.

The pollster, Research 2000, vigorously denied the accusations. And it is fighting the accusations hard, to the point of having its lawyer insist that another critic, Nate Silver of fivethirtyeight.com, cease and desist from his harsh commentary.

The analysis can be found here. And I think the analysis is worth summarizing. Of course, I don’t know whether any results or fabricated. Nor do I know what legitimate adjustments Research 2000 makes to its raw data. I think the discussion is important for actuaries and other number crunchers to understand.

Parts of the critique are tough to follow, in part because the authors – a political consultant, a retired physicist and a wildlife research technician – are trying to explain fairly sophisticated analysis in Everyman terms. That’s never easy. It’ll take a long post from me just to summarize their work.

Basically, the authors work off two principles.

First, although polls reflect attitudes and pick up trends, the actual results should behave randomly. For example, two polls taken on the same day from the same population should usually give different results – just as if you flip a coin 100 times, then flip it another 100 times, the number of heads probably won’t be the same.

Second, it is difficult to manipulate random numbers, because the manipulation won’t look random.

To show how data fails to act randomly, the authors pick up the results below – which show men’s and women’s favorable and unfavorable ratings for leading politicians.

See the pattern?

These numbers follow some well known patterns, like women are more favorably disposed to Democrats and men are more favorably disposed to Republicans. Now look closer. The percentage of men favorable to Obama is 43%, an odd number. The percentage of women favorable to Obama is 59%, also an odd number.

Similarly, the percentage of men unfavorable to Obama is an even number (54%), and so is the percentage of women unfavorable (34%).

The same pattern is true for every politician. If the percentage of men favorable is an odd number, so is the percentage of women. And if one is an even number, so is the other. And it’s true everywhere in the table. It’s even true for the undecideds.

Now could this happen randomly? Of course it could. But the authors assert that Research 2000 over a period of months had 778 Male/Female breakouts. And this pattern was found 776 times. The authors say the probability of the unfavorables turning out that way is less than 1 in 10²³¹. They compare that to the number of atoms in our cosmic horizon, but I prefer to point out that you would be more likely to win PowerBall 27 times in a row.

Next, the authors look at some of the small subsamples from the polls. There, random variation should play a much bigger role in the numbers. That’s why the margin of error from a small sample is greater than the margin of error from a large sample. Or even more simply, that’s why the percentage of ‘heads’ from 1000 coin flips is probably closer to 50% than from 10 flips.

So results in subsamples should fluctuate a lot, and the lack of fluctuation in a subsample would be damning.

So the authors looked at the favorability margin (% favorable minus % unfavorable) of two relatively small groups over time – Independents (no political affiliation) and Other (neither Democrat nor Republican but something else, like a Libertarian). Here are the results for Obama over time:

Not enough variance

We’re interested in three columns. Two are labeled Marg, being the margin – the difference between favorable and unfavorable. One is the margin for Independents, and one is the margin for Other. The third column of interest is Diff, which is just the difference between the two margins.

Notice the margins fluctuate quite a bit within each category. The margin for Independents goes from 43 to 67. And the margin for Other goes from 46 to 71.

The authors contend that, over weeks and weeks – they look at 60 consecutive weeks of data – Obama’s popularity amongIndependents doesn’t change much, nor does his popularity change much among persons in the Other category. The jumps and dips in the data come mainly from random fluctuation.

Next, they contend that if most of the fluctuation is random, you should see that randomness in the difference column – the swing from highest difference to lowest difference should be really great. Some weeks the Independents would randomly move one direction while the Other would randomly move the opposite way. And in other weeks, the two groups will move in the same direction.

Moving in tandem

But that ain’t what the data show. The two margins move nearly in lockstep. The graph at the left shows this.

When the margin for Independents rises a little, the margin for Other rises a little. When the margin for Independents rises a lot, the margin for Other rises a lot. When the margin for Indpendents falls a little, so does the margin for Other.

So the difference between the margins – the right hand column – doesn’t move much at all. The highest difference is +3, and the lowest difference is -4.

The margins are highly correlated. Absent any other explanation, it is one heck of a coincidence.

To really bring the point home, I’ve created the tables to the right. The upper table just summarizes the data reported by Research 2000. It pulls the Marg columns and the Diff column from the table we were just looking at. Note the correlation statistic of 0.94. (For the math-challenged: The maximum is 1.00.) That tells you the Independent margin and the Other margin appear to move in lockstep.

The lower table takes the same numbers and rearranges them so that they are loosely correlated. (The correlation is -.08.) Now look at the Diff column. See how it fluctuates, from -26 to +16. This is what you would expect to see.

Now I’ve also included another statistic, the variance, because that is what the authors want you to look at. If you aren’t used to statistics, variance measures the variability of numbers in a data set.

Now variance has a fun property. If two data sets are uncorrelated – say the margin for Independents and the margin for Other – then if you create a third data set by taking the the difference between each pair of values, that third data set will have a variance equal to the sum of the variance of values from each data set.

If that’s hard to follow, don’t worry; here’s an example.

In the last tables, the first data set is the Independent margin. Its variance is 69.84. The second data set is the Other margin, and its variance is 85.55. The sum of those is 155.39. So for the two data sets to be uncorrelated, the variance of the third data set – Independents margin minus Other margin – should be close to 155.39. (Random probability says it will almost certainly not be exactly 155.39.)

But notice the third data set of the actual data is just 10.57 – way, way lower than you would expect. The uncorrelated data sets I created generate a variance in the difference column of 167.43, about what you would expect if the results were randomly generated.

Now the authors focus on variance because they can rigorously test how likely Research 2000 data was arrived at by chance. The answer: 1 in 10 quadrillion, which is a nerdy way of saying not too damn likely.

Fortunately, the authors’ final argument is fairly simple, and it rests on a fascinating statistical fact: Nothing happens more often than you think it does.

That means if you were to write down a random list of single-digit numbers off the top of your head, you probably wouldn’t write the number ‘6’ two times in a row. However if a true random number list generates the number ‘6’, then the next number is as likely to be a ‘6’ as any other number. (Math-niks will recognize I’m talking about a sample from the discrete uniform distribution. Others need not fret that detail.)

Just checking

To show this, the authors present a graph from the famed pollster Gallup. It shows the distribution of the change in Obama’s favorability rating from one survey to the next. (There are 162 surveys, a new one conducted every three days.)

The graph looks like the normal distribution – the bell curve. And it should. From day to day, there shouldn’t be much change in Obama’s favorability rating. The change that’s there should be statistical noise, and there should be quite a few polls that show no change at all.

Now let’s look at the same analysis from Research 2000 polls:

Not what you'd like to see.

This graph shows that Research 2000 doesn’t show a bell curve. And it doesn’t show a bell curve because it doesn’t show ‘no change’ often enough.

It looks like the Research 2000 data was manipulated, with someone at the firm randomly changing results – but failing to recognize the likelihood there would be no change to results.

Did that really happen? Of course, I don’t know. These things could be a coincidence.

But the authors’ analysis shows that to be unlikely, to say the least. And the founder of the Daily Kos web site intends to sue, concluding that the poll results were manufactured.

I’m not that close to the data, so I can’t make that argument. There could be legitimate explanations for the phenomena, or some fundamental errors in the authors’ work.

I do look forward to the explanations that will emerge in the coming weeks.