As you can guess from the title of this post, I am trying to pose elections as statistical experiments. I know there is a big difference; unlike polling, elections do not involve taking a sample of data to make a decision. Instead, every single vote is expected to be counted in every election. So, yes, in that sense, an election probably is not a statistical experiment. There is still, however, some aspect of uncertainty, or errors, in the elections as well.
One source of this error could simply be the counting errors, whether it is done by a machine or by hand. I am not going to discuss this source of error, as there is always a possibility of recounts, and multiple times for that matter. These recounts, if done independently, can significantly decrease the magnitude of potential counting errors.
Another source of error, which has been widely discussed in recent years, is the possibility of ineligible individuals having voted in an election. Most of the arguments provided by the election deniers after the presidential elections of 2020 were related to potential errors in this category. Claims of widespread and outcome changing voter fraud were abound, and over 60 court cases were filed challenging the results of those elections. Of the 64 cases filed, the plaintiff only won a single case which involved too few votes to make any substantial difference to the election results. As evident from this one case, and more simply as a matter of being realistic, I don’t believe anyone can confidently deny the possibility of some individuals, who are not eligible to vote, having voted in any given election. The question is not whether such errors are possible or not; it is one of the magnitude of the errors, and whether they can be outcome changing. In other words, what would be a good confidence interval for determining the result of any given election?
In order to address this question, one needs to have a large set of data from past elections, along with the corresponding magnitude of this error in each of those elections. I don’t believe such a dataset exists. Absent such a large dataset, probably the most reliable source of data is the recent investigation in the state of Georgia, where top elections official said that a check of voter rolls found that 20 of the 8.2 million people registered to vote in the state were not U.S. citizens. The office of the Secretary of State Brad Raffensperger also said that none of those people had cast a ballot in a recent general election, but nine of the 20 had voted in previous elections and the other 11 had no record of voting.
When we try to find a reliable confidence interval, especially one that does not enjoy the availability of a large dataset, it would be prudent to overestimate rather than underestimate the magnitude of potential errors. So, to be on the safe side, let’s just ignore the fact that 11 of the 20 non-citizen registrants have no record of voting, and instead, assume that all 20 have voted, and that they have voted in every election possible. This itself is a significant upscaling of the magnitude of the potential error, as there could be a large number of such possible elections in one’s lifetime, and the non-citizen registrant may have voted in only one or a few of them. Still, for even further increased confidence, let’s multiply the number by another factor of 10, and assume that there were a total of 200 ineligible voters out of the 8.2 million registered voters in Georgia. That is a ratio of less than 0.0025% (one in 41,000, to be more precise). With the above multipliers that, combined, have resulted in assuming a number significantly larger than 10 times the actual number of ineligible voters, I think the above fraction of 0.0025% can be considered a reasonable confidence interval of the election results in Georgia.
To make the above number applicable to other states as well as national elections, we should also factor in the fraction of immigrant population in various states. Based on my Google search, it seems California has the largest population, as well as the largest percentage of foreign-born residents. That is a total of over 10M immigrants, or about 27% of the population of the state. In contrast, Georgia has less than 1.2M immigrant residents, which is less than 11% of the population of the state. To make the above confidence interval, obtained for Georgia, applicable to the worst case scenario with the largest percentage of immigrant population, let’s apply an additional factor of 27/11 or about 2.5, to make it 0.00625%. Actually, while we are in the spirit of overestimating the magnitude of the error, why don’t we make it an even 0.01%? It’s such a simple number to remember and to apply to large numbers. For Georgia, this would correspond to almost 100 times larger than the actual magnitude of the error, or to be more precise, 91 times the actual number of ineligible voters, which was 9. Also note that we assumed these ineligible voters have voted in every single election, not just some. So, with this level of overestimation, I hope everyone would be comfortable with my proposed confidence interval of 0.01%.
What the above confidence interval means is that, e.g., for the result of the popular votes in a presidential election, assuming a total of 150M voters, the margin of error is (significantly) less than 15,000. That is, if the difference between the vote counts for any two candidates is less than 15,000, it is better to not call the the result of that popular vote. But if the difference is in millions, then there is really close to zero chance of having this kind of errors in the results. Similarly, in the state of Pennsylvania with about 7M voters in the recent presidential election, the margin of error would be significantly smaller than 700. As a result, with the difference of over 120,000 between the two leading candidates, there should be very little doubt about the result of that election in Pennsylvania (discounting potential intentional tampering that I do not consider as statistical errors, and will briefly discuss below).
Now, if we apply the above test of the confidence interval of 0.01% to the results of the presidential election in 2020, we would get the following for the swing states:
- Arizona, total: ~3.3M, margin of error: 330, difference: 10,457
- Florida, total: ~11M, margin of error: 1,100, difference: 371,686
- Georgia, total: ~4.9M, margin of error: 490, difference: 11,799
- Michigan, total: ~5.5M, margin of error: 550, difference: 154,188
- North Carolina, total: ~5.4M, margin of error: 540, difference: 74,483
- Ohio, total: 5.8M, margin of error: 580, difference: 475,669
- Pennsylvania, total: 6.8M, margin of error: 680, difference: 81,660
- Wisconsin, total: 3.2M, margin of error: 320, difference: 20,682
As we see, while admittedly some differences seem quite small compared to the voting population of the states, all the differences are significantly larger the margin of error. Even the smallest relative difference in Georgia is still more than 24 times larger than the very generous margin of error of 0.01% that we derived above. Therefore, calling that election stolen, based on the argument that ineligible voters had skewed the results in the favor of a certain candidate is, at best, misguided and less than fair.
Of course, the above errors are not the only potential sources of error. There could be malice in the actual counting process, or manipulation of the results by the voting machines. I would not consider such issues as statistical errors, but rather items that should be litigated in courts, as were done in the aftermath of the 2020 elections. A rejection rate of 63 out of 64 (or over 98%), and the fact that the Supreme Court did not take up any of the appeals for these cases, leaves little room for speculations of biased rulings by the courts, as the judges were appointees of different presidents over the past several decades, from the entire political spectrum.
Bottom line, the fact that pretty much the only qualification for being considered for an appointment to or nomination for a position in the new administration is denying the results of the 2020 elections, speaks volumes about the integrity of such appointees, and makes the outlook for the next four years quite grim.