Reassessing 2020 Senate Races With Data-Driven Assessments Of Candidate Quality

When the pre-election discourse around candidate quality pops up, my mind always goes to the famous Arthur Conan Doyle saying of how “it’s easy to be wise after the event”. It sums it up more accurately than any of us would care to admit — pre-election polling numbers, vibes, and anecdotal stories often color our perception of how a candidate is actually doing, and Twitter and media are awash with tales of how a candidate has managed to “upend conventional wisdom” en-route to presumably smashing through pre-election expectations.

Then the election happens and some of those assessments are borne out, as with Jon Ossoff and Raphael Warnock flipping Georgia for Democrats, while others fall flat on their face, like Sara Gideon losing by a massive amount to Susan Collins despite leading nearly every pre-election poll. And suddenly, narratives turn on a dime, and the “dynamic candidate” who was going to “take the state by storm” becomes “another tale of hubris”. Candidates who were royalty a week ago become persona-non-grata in circles, and tales of their generosity and warmth are replaced by whispers of disorganization and complacency.

The truth is, gauging candidate quality is often a fool’s errand in which takes are largely grounded in hindsight more than anything and ignore readily available data. As polarization grows and crossover voting declines, more and more races become easily explainable by factors like a state’s presidential lean. To really quantify how “good” or “bad” a candidate’s performance is, we must first establish a baseline by which to compare against. Many techniques may be used for this, but we’ll try to establish how a generic pair of candidates would have done in an election if given the same funding and environment and then examine the actual election result’s deviation from the expected baseline.

The question, then, becomes what factors we should control for. A large portion of the outcome in each Senate race can be explained by easily available and quantifiable factors, especially in presidential years. To quantify how much, we can assemble a model that utilizes candidate and party financial activity, presidential partisanship, candidate incumbency, and state racial and educational demographics to predict the outcome of its Senate election. By controlling for these factors, we can get a rough idea of how a generic pair of candidates would have done if given the same resources and circumstances, and we can then assess candidate quality by examining their overperformance against expectations.

The single most important factor in determining a state’s election results is its presidential lean; in fact, we can attribute ~85% of a state’s 2020 senate election results to its 2020 presidential lean. This makes the national environment all the more important, as it is the single greatest determining factor in how a state will vote. For example, Iowa’s Senate election was one that just about any Democratic challenger would have lost; Theresa Greenfield put up a superb performance, outperforming Biden by ~1.5 points on margin, and yet she still lost comfortably because Trump won the state by 8.

Senate races, however, are often won and lost on the margins more than anything, and in close states, this often makes the difference. It is here that we consider incumbency and spending. Incumbency provides a very real boost to candidates, especially among change-averse or content voters who have grown to recognize and like the incumbent, even across partisan lines; for example, Susan Collins almost certainly has this to thank for her remaining in the Senate.

Financial spending, meanwhile, is one of the few ways in which electorates can be actively swayed and shifted, whether by targeted messaging or by turnout operations. We can roughly quantify its direct impact too; in fact, in 2020, ~20% of the variance could be attributed to spending by campaigns and affiliated groups. Lastly, we add in controls for the demographics of a state.

Controlling for all of the above factors, we can get a better gauge of candidate quality, which can be examined by the following question: How much did a race’s results deviate from what a generic pair of candidates would have been expected to get with the same circumstances and resources? For this, we’ll assemble a multilinear regression model with two-way presidential results, incumbency, demographic data, and fundraising numbers fed into it, regressed against the actual two-way Senate results.

The resulting 2020 over/underperformance map is below.

The results are presented in tabular form as well.


A couple of notes of caution before we proceed:

  1. Just like Baseball’s Wins Above Replacement (WAR) metric, our Performance Above Expected (PAE) metric must be contextualized with proper uncertainty and error bands; gaps of less than half a point are simply too close to draw definitive conclusions from when comparing two candidates. For an example of this, let’s compare John Cornyn and Mike Rounds. Cornyn has a PAE of 1.4 and Rounds has a PAE of 0.9. This metric is simply not granular enough to let us conclusively decide who performed better. However, what it can tell us is that both were exceptional, especially relative to their expected baselines. On the other hand, when comparing Mark Kelly (4.4 PAE) to, say, Barbara Bollier (2.2 PAE), we can have a reasonable degree of confidence that Kelly overperformed by more than Bollier, given the separation between the two.
  2. Although the two are increasingly becoming independent, fundraising is not always easily separable from candidate quality. As a result, our model likely undershoots some candidates who vastly outraised their opponents in states of opposing partisanship; Mike Espy was probably a better candidate than what we give him credit for, but because he outraised Cindy Hyde-Smith by 9M, the model thinks that a lot of that overperformance was due to his considerably higher fundraising. While that is likely correct, a Democrat running in Mississippi probably has a harder time making a race competitive in the national eye than a Democrat in Texas. What this model is saying is that Espy probably did around two points better than a similar candidate would have done given his 9M fundraising advantage; however, what the model may not understand is that almost no other candidate could have managed to get a 9M fundraising advantage in Mississippi.So why control for fundraising, then? Fundraising still does move the needle, and we want to assess how much of a candidate’s overperformance can be quantified by the numbers. Our purpose is in seeing how candidates did relative to the resources they had at their disposal, as this would help us assess their relative strengths. This helps us understand which candidates give the best bang-for-the-buck in campaigns, and which ones can best cross the bridge that money cannot do on its own.
  3. It is difficult to ascertain how much of an election’s deviation from the baseline was down to one candidate being *bad* vs another candidate being *good*. The conventional wisdom states that Mark Kelly was an exceptional candidate while Martha McSally was a mediocre one; however, how much of Kelly’s 4.4 PAE was down to him excelling and how much of it was down to McSally underperforming? Similarly, in North Carolina, widely-derided candidate Cal Cunningham’s PAE was actually +1.1, but how much of this was down to Thom Tillis being a relatively unpopular incumbent who was likely carried over the line by Trump’s coattails? Teasing this separation out is a near-impossible task, and so while we’ll refer to PAE in terms of single candidates for brevity (e.g. Kelly had a 4.4 PAE, Collins had a 11.8 PAE, etc), we encourage the reader to draw their own conclusions regarding how much of a race’s deviation was down to which candidate.

Examining the 2020 Senate map under the lens of “performance vs expectations”, some things pop out to the eye. Firstly, it is entirely likely that candidate quality was decisive in helping Democrats clinch Senate wins in Georgia and Arizona; results around the nation and fundamentals suggest they should have lost both races, but Jon Ossoff (+3.3 PAE) and Mark Kelly (+4.4 PAE) overperformed their baselines by enough to flip seats in races they probably should have lost. On the flip side of this, Susan Collins (+11.8 PAE) was arguably the best candidate of the 2020 cycle and almost certainly saved the race for Republicans in what once looked like a certain Democratic pickup by virtue of her own candidate strength.

Secondly, despite the national underperformance relative to expectations (losses in North Carolina, Maine, Iowa, and Montana were all considered major disappointments by many) Democratic recruitment in key swing states was arguably quite stellar; the only swing-state battleground recruit that underperformed was Sara Gideon, and it is difficult to tell whether anyone would really have beaten Susan Collins in her race, given her exceptional candidate strength.

Steve Bullock (+8.0 PAE), Mike Espy (+2.2 PAE), Theresa Greenfield (+4.0 PAE), Jaime Harrison (+1.3 PAE), and Barbara Bollier (+2.2 PAE) all lost their races, but they performed better than what Democrats might have been expected to manage given the national results. Greenfield, Bullock, and Espy, in particular, were genuinely phenomenal nominees that were simply sunk by ticket splitting declining to a record low in this election — just about nobody could have won their races, given the magnitude by which Donald Trump won their states.

In fact, we can take this a step further; oft-ridiculed Cal Cunningham (+1.1 PAE) probably didn’t cost Democrats the North Carolina seat despite his scandals. The regression suggests that he might have actually done a touch better than what a generic Democrat should have achieved against a generic Republican in the same circumstances, and while that may seem hard to believe, it’s at least enough to indicate that another standard Democrat probably wouldn’t have won either. Biden losing the state by over a percent makes it somewhat difficult to argue that Cunningham would have been able to do what no other candidate (save for Susan Collins) managed nationally in winning his race while the presidential nominee lost it, and so it’s likely that candidate quality was not the deciding factor in costing Democrats the North Carolina Senate race, contrary to what conventional wisdom settled on in the aftermath of November 2020.

In a similar vein, many post-election assessments regarding losing candidates tend to be misleading at best and often misguided. The tendency after the election was to mock candidates like John James (+0.5 PAE) and Amy McGrath (+5.7 PAE) for losing races that money was shoveled into at a record rate, but this isn’t necessarily something that the data supports. James came within a hair of winning Michigan and outran Trump while running against a (supposedly) strong incumbent in Gary Peters, while Amy McGrath outperformed Biden by over 6 percentage points.

In James’ case, this was his second statewide overperformance, and so we might be more comfortable in drawing the conclusion that he wasn’t actually as bad of a candidate as one may assume by looking at his status as a two-time election loser. In fact, James overperformed polling and expectations in both his 2018 and 2020 races, and in a better Republican environment, he probably would have won.

In McGrath’s case, McConnell’s unpopularity likely had a sizable amount to do with this overperformance, and it can also be interpreted as an 8 point Republican underperformance more than anything. That said, among the biggest criticisms of McGrath was that she spent a lot of money for nothing, and it’s worth noting that the margins in this race indicate a Democratic overperformance even after controlling for the immense amounts of money poured into Kentucky’s Senate race. The problem was simply that no Democrat could conceivably have won in Kentucky in 2020, but that isn’t really down to McGrath’s campaign or candidacy — Andy Beshear himself wouldn’t have won this race in a presidential year.

Lastly, although much of the focus was on the disappointing results of Democratic challengers, it was actually Democratic incumbents across the nation who significantly underperformed their partisan baselines. Only two out of the eleven Democratic incumbents running for re-election overperformed expectations: Jack Reed (+13.1 PAE) and Jeanne Shaheen (+6.2 PAE).

There could be multiple explanations for this, and the “why” is far more difficult to analyze than the “what”. One theory would claim that this has to do with incumbents “coasting” and not taking their re-elections as seriously in the light of overly rosy Democratic polling numbers. Another could point to a potentially unfriendlier down-ballot environment for 2020 Democrats. However, it’s worth noting that while many felt that most Democratic challengers underperformed in the immediate 2020 aftermath, the likelier reality is that the underperformance was mostly among incumbent Democratic senators. Challengers generally did about as well as one could have expected given the national environment that actually materialized — the polling misses aren’t the fault of any specific candidate!

Gauging candidate quality is still one of the most difficult tasks in election handicapping, and they impact everything from the race ratings of forecasters to funding decisions made by PACs and parties. But although we may not have as reliable of a way to gauge this ex-ante, we have managed to establish a decent technique by which to do this ex-post facto, which helps us get a handle on how candidates actually performed in terms of results relative to the resources they had and the rest of the nation’s results. And that’s still a big step forward from what we do have.

Editor’s note: On 04/02/2022, this article was updated to use fresher data and included slight methodological tweaks to better account for small-state skew. The directional findings remain the same, but the map and table have been updated accordingly.

I’m a software engineer and a computer scientist (UC Berkeley class of 2019 BA, class of 2020 MS) who has an interest in machine learning, politics, and electoral data. I’m a partner at Split Ticket, handle our Senate races, and make many kinds of electoral models.