Our 2024 Polling Aggregators (And How They Work)

With the election cycle in full swing and the electoral picture stabilizing, it can be easy to overreact to noise in individual polls and develop narratives around them. That’s why today at Split Ticket, we’re releasing our polling aggregates for 2024. Our hope is that you can use them to develop a better insight of where the election actually stands, free from pundit narratives that may be detached from public opinion.

What They Say

In most of the key swing states, the race for President could not be tighter. In Arizona, Georgia, Nevada, and North Carolina, our averages suggest a lead of no more than half a point for either candidate. In Pennsylvania, the current tipping point state in our averages, Harris retains a small, one point edge. She has slightly more substantial leads in Michigan and Wisconsin, but they are far from commanding. In other words, the swing states are looking exceedingly close; a minor shift in either direction could tip the scales. 

Our polling averages for Senate races, however, are a different story. Democratic candidates for Senate are broadly outrunning Harris, and by more than just a point or two. In the key swing states that are also home to Senate races (Arizona, Michigan, Nevada, Pennsylvania, and Wisconsin) Democrats comfortably lead by 5 points or more. They are performing fairly well in more Republican territory, as well. In Ohio, Sen. Sherrod Brown leads by nearly 7 points in our average, while Montana Sen. Jon Tester trails by just under a point.

Still, it is worth noting that Democrats need to sweep most of the competitive races to retain control of the chamber, even if a Vice President Tim Walz could serve as tiebreaker. If Trump wins, the polls suggest either Florida or Texas would decide Senate control (with Democrats headed for an assured loss in West Virginia, they would need to flip a Republican seat to hold on to 51 votes). If Harris wins, our aggregates suggest the battle for Senate would come down to Montana, the closest race in our averages.

A word of caution regarding the Senate averages: there are far fewer Senate polls than there are presidential polls. As such, the data is significantly sparser for some of the states, like Texas and Florida. The overall picture, however, remains relatively clear.

How They Work

In a nutshell, our aggregates control for poll age, pollster quality, population, sample size, and house effects. We accept each poll that FiveThirtyEight collects, except for those from ActiVote and Trafalgar, and we default to the full-field matchup if a pollster polls both the full-field and the head-to-head versions of the race.

Let’s break down the reasons for each factor that we control for. It’s obvious that a poll taken today will be more representative of the current state of play than a poll taken two months ago. What is less obvious is how to handle the impact of poll age. Some agencies (like RealClearPolitics) handle this by including all polls, weighted equally, within a certain window, and then dropping them once that window passes.

In our opinion, this is not a good idea, and it leads to artificial jumps and drops caused by outliers leaving averages. These shifts are induced purely by polls abruptly “dropping out” of a window, which doesn’t reflect reality, because it signals movement to readers even when there has been none.

Instead, allowing for the weight of a poll to slowly decay over time yields a significantly more stable average. For this reason, we use an exponential decay function factoring in the time since the poll was taken. (As an additional benefit, using this also avoids incredibly silly commentary like “Wait for tomorrow and the Split Ticket Georgia average will jump from Harris +1 to Harris +2, because that Trump +6 outlier is going to age out of the 21-day-window!”)

We also weight and control for pollster quality. Good pollsters who are more methodologically transparent and use quality data sources tend to yield more trustworthy work that is also generally more accurate. To evaluate a pollster’s quality, we use FiveThirtyEight’s “star” ratings, which grade pollsters based on a combination of methodology, transparency, and accuracy.

We won’t bore you with the details here of the exact mathematical function we use, but we’ll give you a good example of how our “pollster quality” function works in reality. Let’s assume that on a certain day, three separate polls that survey 800 likely voters in Georgia are conducted by three different agencies. The first is by the New York Times/Siena College, which has a 3.0/3.0 rating on FiveThirtyEight. The second is by Redfield & Wilton, which has a 1.8/3.0 rating. And the third is by Clout Research, which has a 0.7/3.0 rating.

The only factor differentiating these three will be pollster quality. They may be identical in every other aspect, but it is obvious, based on both rating and track record, that these polls should not all be given the same consideration by a reader (which is another reason we don’t do unweighted averages). In our aggregator, the Redfield & Wilton poll would receive 53% of the weight that the Times/Siena poll does, while the Clout Research poll would receive 16% of that weight.

This is very much a design choice, and we want to stress that there are no “right” answers. In our opinion, this yields a polling average that is inoculated against low-quality pollsters “flooding the zone” (like what we saw in 2022, especially at the state level), and reflects the high-quality data when we have it.

Of course, pollster quality is not a guarantee of accuracy — as readers may remember, 2020 saw agencies like Trafalgar and Rasmussen succeed. But in 2022, these were some of the worst pollsters around, and they also yielded hilariously bad estimates across the board. In general, the data shows that agencies that use high-quality data with robust methodologies have empirically performed better over a long period of time, and for that reason, we place more trust in them.

We also control for sample size. Surveys with smaller sample sizes have greater variance and larger margins of error, and all else being equal, a poll of 1,000 likely voters in Montana generally should get more weight than a poll of 400 likely voters. The exact function we use is the square root of the sample size, divided by 800 (which is fairly similar to what Nate Silver did at FiveThirtyEight), and we cap samples at a certain size in order to prevent one poll from dominating the averages.

We also control for population — i.e. whether the survey is of “likely voters” (LV) or “registered voters” (RV). All else being equal, a survey of 800 RVs will get about 82% of the weight that a poll of 800 LVs gets. If an agency polls both registered and likely voters in the same survey, we take just the LV result. (Polls of adults are not considered, of course, because that is not a voting group.)

The last thing we control for are “house effects”. By this, we mean partisan affiliation. Back in 2017, Nate Silver conducted a fairly thorough study of internal polls in the FiveThirtyEight database and found that released internal and partisan polls were, on average, off by four or five percentage points in favor of their sponsor’s party. To account for this, we apply a minor adjustment away from the sponsor’s party if a poll is labeled as partisan on FiveThirtyEight. We also downweight a partisan poll to have two-thirds the weight of an unaffiliated poll, so as to not let these swamp our averages.

As a final parting thought, consider that there are three* major, widely-cited presidential polling aggregates in current news: FiveThirtyEight, The New York Times, and Silver Bulletin. At the time of publishing this post, all of them broadly find something very similar, both to each other and to us. In most cases, the differences are generally relatively minor.

With that said, there’s no fun in relying on what everyone else has done, especially when we have our own unique flavor and approach to provide. Besides, we do have something that nobody else does yet: Senate polling aggregates, done in the same format as our presidential polling aggregates. For now, we’ll be publicly posting daily graphics of all 9 battleground states in the Senate elections, and all 7 core Presidential battlegrounds.

We hope you’ll find this useful — we’ll be using it for our House and Senate models (and we’ll release model updates very soon!). These trackers will be updated daily. You can find the Senate one here and the Presidential one here.

*Three other excellent aggregates are found at VoteHub, JHKForecasts, and RaceToTheWhiteHouse. The people who run those sites are extremely smart — we recommend checking them out.

I’m a computer scientist who has an interest in machine learning, politics, and electoral data. I’m a cofounder and partner at Split Ticket and make many kinds of election models. I graduated from UC Berkeley and work as a software & AI engineer. You can contact me at lakshya@splitticket.org

I am an analyst specializing in elections and demography, as well as a student studying political science, sociology, and data science at Vanderbilt University. I use election data to make maps and graphics. In my spare time, you can usually find me somewhere on the Chesapeake Bay. You can find me at @maxtmcc on Twitter.

I make election maps! If you’re reading a Split Ticket article, then odds are you’ve seen one of them. I’m an engineering student at UCLA and electoral politics are a great way for me to exercise creativity away from schoolwork. I also run and love the outdoors!

You can contact me @politicsmaps on Twitter.

Discover more from Split Ticket

Subscribe now to keep reading and get access to the full archive.

Continue reading