Mahmoud Ahmadinejad won the Iranian election with 62.6 percent of the vote, compared to Mousavi's 33.75 percent. Or so says the Iranian government. The fact that Mousavi had such conspicuous public support and that the results were announced so quickly after the vote has raised suspicions of fraud. Indeed, the post-election demonstrations in Iran are predicated on the belief that the vote was fixed.But was it? It's a hard case to prove, of course. And, in fact, before the election two Washington-based public policy institutes commissioned polls that showed Ahmadinejad leading by a 2 to 1 margin.
Recently the Tehran Bureau, an independent site that's been a source of news for both Andrew Sullivan and The New York Times, posted a purported "smoking gun": a chart showing a perfect linear relation between votes for Ahmadinejad and for Mousavi as the results came in.That same day the celebrity statistician Nate Silver pretty well debunked the case, showing there would have been the same linear relationship in the 2008 U.S. presidential election had the popular vote been released in six big chunks as it was in Iran.So it remains an open question. And an important one. Ahmadinejad is certainly a less-than-ideal leader for any country, but discovering that the vote wasn't tampered with would certainly undermine the legitimacy of the protests-and discovering that it was tampered with would undermine Ahmadinejad's claim to power.The best recent investigations into the possible fraud have come from the University of Michigan statistician Walter Mebane, who specializes in election forensics. Over the past few days he's been working on an investigation. Using raw data from the Iranian government, he's compared district-by-district voting patterns in the 2005 election to those in 2009, taking into account the effect of boycotts in 2005 and the extra mobilization in 2009. His conclusion: it looks "a lot like fraud." When he uses a refined model from 2005 to predict 2009's results, there ends up being lots of unaccounted-for support for Ahmadinejad in the numbers released by the Iranian ministry.Over the phone, Mebane expained his approach. His (detailed but manageable) description is below. If you want to wade into the numbers yourself, his paper is available for download (pdf).
Walter Mebane: You need to get a sort of a background to have some expectation of what the vote count should have been in different parts of the country [in 2009]. Ideally I would have very low-level aggregation of the data-polling stations, precints, and ballot boxes-but that hasn't been released yet.This data was from, I guess they call them "towns" or "cities." They're like administrative units composed of cities or sets of cities. There's a very wide range of sizes. It's kind of like having county-level data in the United States. I call them "towns" in my paper. There are 366 of these "towns" in this dataset, which was downloaded from the Iranian ministry dataset.So you have this town-level data of the 2009 election with the votes for all four candidates. And the question is: How do I evaluate that? Do I expect Ahmadinejad to do well or badly in a particular place? So what I did in this paper is use data from previous elections. They had a presidential election in 2005. They had many of the same towns and I was able to match up 320 of the towns that were the same in both 2005 and 2009.The election in 2005 had two stages: a preliminary round and a runoff between the top two finishers. First I did analysis just using the second stage results, which was Ahmadinejad versus Rafsanjani, and tried to predict the 2009 results. I had two different measures for the vote in 2005. One was a function of the proportion of the votes for Ahmadinejad and the other was the ratio of the total votes cast in 2009 divided by the total votes cast in 2005. The former variable was intended to measure partisan support and the second variable [was added] because I had read that people had boycotted the election in 2005-people opposed to Ahmadinejad-and we had a surge in turnout in 2009.That simple analysis showed a pattern that did describe the data [from the 2009 election]. It produced a relatively small number of so-called "outliers": towns where the statistical method I'm using is designed to reject observations that don't match the model you're putting on the data. I had nine different towns out of 320 that were thrown out as a result of that analysis. And another 50 or so that were suspicious.In the version of the paper I released this morning I made a change. Someone gave me data from the first round of the 2005 election. In the first round there were seven candidates instead of just the two I had before. And that gives you a more refined political parsing of the different places. So now we're looking at 2005 divided seven ways in some political space instead of two ways. That should give a more refined picture and a better ability to predict what happened in 2009. And I also have a validation that you can predict the second stage in 2005 using the first stage in 2005 and the results look pretty reasonable. So the data are real I think.So When I do the analysis predicting what happened in 2009 based on the first stage of the 2005 election, I get sort of naturalistic patterns in the coefficients that reprersent the model, so to speak, but the number of outliers explodes. The number of observations that the model does not describe goes from 8 to 79. And that's 79 out of 320, that's a large number. And when I look at the towns that are suspicious, that the model doesn't describe all that well but that I don't throw out completely, that rises to 192 observations. So out of 320 towns, 192 of them are not well described by the model. Moreover, in 172 of those, it's Ahmadinejad's vote that looks suspicious. And among the 172, 119 of those have Ahmadinejad getting more votes than this natural model predicts.So now that looks a lot like fraud. He gaining extra votes that don't match what you predict based on a refined examination of the previous year's election, taking into account the extra mobilization, which the model does, and also looking at all four candidates. It's not a proof by any stretch of the imagination, but it's certainly a more intuitive explanation to say that there are widespread distortions where people simply added a lot of votes or did something to augment Ahmadinejad's support."UPDATE: Walter Mebane has a new version of his paper up [pdf]. He emails:"The patterns in Figure 1 strongly suggest that in many ballot boxes the votes for Karoubi and Rezaei were thrown out while in many ballot boxes extra votes were added for Ahmadinejad. The pattern in Figure 2 suggests that without the ballot-box stuffing fraud the election outcome would have been at least a runoff between Ahmadinejad and Mousavi." Photos from Flickr users Hamed Saber (cc) and .faramarz (cc).