A Nonpartisan Generic Ballot Aggregator

It’s increasingly well accepted at this point that Democrats are on track to have a fairly bad year, at least in terms of the national popular vote. Their president’s approval rating is at 44%, the out-party shows serious enthusiasm, and inflation is at 8%. None of these things generally point to a remotely competitive race for Congress, and many of the partisan pollsters currently releasing public polls suggest that Republicans appear on track for a comfortable victory come November. But when you sit down and look at the polling numbers from nonpartisan pollsters, they tell a very different story.

As of the morning of October 30th, there are two pictures being painted. The first, by nonpartisan public pollsters, is encapsulated in the graph above. It suggests that the November elections are essentially a dead heat, and that if they were held today, the generic ballot would be a virtual tie. The second is by partisan pollsters, and it suggests a more Republican picture; in fact, the average of partisan-affiliated pollsters (Trafalgar, Data For Progress, Navigator Research, Rasmussen Reports, Insider Advantage, and Echelon Insights) over the last week suggests an R+3 year.

These are extremely different forecasts, and there are conflicting signals on who to trust. FiveThirtyEight already has an excellent aggregate that averages all of the polls, but we believe there is also value to be gained in simply excluding all partisan-affiliated pollsters and examining only the nonpartisan ones. That’s why today, with just over a week to go until the midterms, we’re launching our nonpartisan generic ballot average, and our hope is that it will inform you regarding what the best public, nonpartisan pollsters say about the upcoming elections.

Our methodology is extremely simple. Every poll of registered or likely voters from a nonpartisan or bipartisan group with a FiveThirtyEight pollster rating of a B- or better is included. The average is a weighted blend — 30% is the raw average of polls over the last week, 50% is the raw average of polls taken within the last two weeks, and 20% is the raw average of polls taken within the last three weeks. If a firm polls for the same sponsor more than once in a window, we take only their latest poll, and if a firm polls both registered and likely voters in the same poll, we use only the likely voter screen in the average. Lastly, polls of registered voters receive 3/4 the weight of polls of likely voters.

This is not our forecast, and it is not our model. In fact, due to the reasons mentioned at the beginning of the post, we still find it somewhat hard to believe that Democrats will end up winning the popular vote this cycle, even though our initial launch average suggests the most minor of Democratic leads. We believe Republicans are on track to win the popular vote, are clear favorites to win the House, and have 50/50 odds of taking back the Senate. Yet some of the best public, nonpartisan pollsters say that Democrats are currently favored in this midterm, so it is something worth monitoring. Are they catching something no other analyst is, or are they simply wrong and on track for a 2020-esque miss again?

We don’t know the answer to this. But we do believe that having partisan polling artificially dragging down the averages is not the soundest practice and meaningfully clouds the picture, to the point where people are uncertain over what to trust because of the influence of partisanship on polling averages. We don’t want to adjust these for house effects either, as polling error changes cycle-to-cycle, and “unskewing” polls based on past biases like some other websites do is an extremely dangerous game that quickly devolves into data dredging and p-hacking to validate our priors.

Again, it is entirely possible that the partisan firms could be the more correct ones this cycle, and that public pollsters could be wrong. The partisan firms often tend to go off voter files and frequently contact voters by text or live calls instead of the online panels more popular among the public pollsters, which might help them avoid the overrepresentation of Democrats that doomed polls in 2020. Moreover, it’s worth considering that the reason partisan firms are releasing more Republican-leaning surveys may simply be because Democrats don’t have much better ones to show themselves. But this is not a given, and these same firms are not exactly immune to error either, as a quick examination of 2020 polls would show.

Moreover, the purpose of this tool is not to provide a forecast, but rather to provide a tool of assessing what nonpartisan polling outlets suggest about November and to give an easier way of validating their overall performance against the true results. If they are inaccurate and the polling average is brought much closer to the real results by partisan firms forecasting a red wave, then these pollsters do not deserve much credit. Similarly, if they are much more accurate than partisan firms and Democrats do end up having a much better-than-expected November, then a polling average influenced by partisan polls may somewhat overestimate Republican strength — our inclination is to say that this is less likely, but it has happened in a couple special elections.

Our goal isn’t to supplant FiveThirtyEight, who we believe have the best polling aggregator in the business. If you want the full polling picture that includes all polls, we strongly recommend checking out their aggregator, which we regularly use as well. Our purpose is simply to provide another open and transparent way of gauging the indications from and assessing the performance of surveys done by public, nonpartisan pollsters. We believe that there is value to be gained from a simple, open and transparent initiative in the public aggregation of nonpartisan polls, and yet we are unable to find one anywhere. With that said, it is interesting that our nonpartisan aggregator has a Democratic lead of 0.4 points on the generic ballot, which is really only about 1 point away from FiveThirtyEight’s R+0.6 aggregate right now, but a whopping 3.3 points different from the RealClearPolitics average, which currently sits at R+2.9.

On that note, some of you may ask us about RealClearPolitics, who also have poll aggregates. The problem is that they selectively exclude polls, as Nate Silver and G. Elliott Morris have both pointed out, and the methodology has never entirely been clear regarding which polls they accept and which polls they omit. In our view, it is not good practice for a large, reputable, and widely-cited website to do this, as it gives a seriously misleading picture about what exactly is going on, which can be dangerous in an era of heightened partisanship. Polling and forecasting are highly important tasks that are becoming increasingly difficult, and data scientists should be completely transparent about their methodology. It is okay to be wrong, and in fact, we would rather someone be wrong while laying out a clear, robust methodology than for them to be right by accident, because the former is fixable while the latter is not.

Our average will have a graphic at the top and the full poll table and methodology description below, and it will be updated at least once per day. We will be open and transparent about everything we do for our generic ballot average. We will include any and every poll that meets the criteria we have laid out above, regardless of what the poll says. And if you ever find us missing a poll or feel that we are breaking from the practices listed above, please shoot us an email at info@split-ticket.org, DM us on Twitter @SplitTicket_, or contact me on Twitter @lxeagle17.

Acknowledgements: We would like to thank FiveThirtyEight for their public polling database that we have scraped this from, Joe Gantt for letting us discuss methodology in great detail with him, and Dan Guild for inspiring this and talking to us about it, as he has been tracking this same phenomenon with Senate polls for a while now.

I’m a software engineer and a computer scientist (UC Berkeley class of 2019 BA, class of 2020 MS) who has an interest in machine learning, politics, and electoral data. I’m a partner at Split Ticket, handle our Senate races, and make many kinds of electoral models.

Discover more from Split Ticket

Subscribe now to keep reading and get access to the full archive.

Continue reading