How can Democrats take back CA-25?

My previous political analyses have looked back at the 2016 election, but today I want to start looking forward to next year’s midterms. There are good reasons to think that Democrats will gain ground in both the Senate and House- especially after the recent rejection by Alabama voters of Roy Moore, and the results of the New Jersey and Virginia elections earlier this year. FiveThirtyEight goes so far as to call the Democrats the favorites to win the House.

But there’s one election in particular that caught my eye- California’s 25th Congressional district, which is currently held by Republican Steve Knight. Knight is one of Congress’s most prominent anti-science members, and yet somehow serves on the House Committee on Science, Space, and Technology.

CA-25 is being contested by the Democratic challenger Jess Phoenix, who stands in contrast to Steve Knight by being an actual scientist (a volcanologist, in fact). A PublicPolicyPolling poll recently found that Steve Knight is losing to a generic Democrat by 4 points, and given the momentum that Democrats currently have, it’s tempting to think that Phoenix will cruise to victory in 2018. But the election is over a year away, and anything can happen between now and then- a painful lesson that hopefully we all learned from 2016.

Incumbents in the House almost always win well over 90% of their reelection bids in past decades. Phoenix also faces an uphill battle being a woman (who currently hold only 84 out of 435 House members) and a scientist (currently only 2 House members - Bill Foster of ILlinois’s 11th district and Jerry McNerney of California’s 9th district- hold a PhD in a scientific field).

I thought I would use this blog post to do what I can to help out her campaign. Of course, I don’t know how sophisticated her campaign’s data analytics team is- perhaps this information won’t be of much use. But it does show an interesting way that machine learning techniques I’ve discussed previously can be used in a practical way for an ongoing campaign.

Breaking down the data

As far as I know, no states report election results at a more granular level than the county- and since congressional districts rarely line up with county boundaries it’s not easy to break down a vote for Congress to any smaller level than the district. But we can get more granular data at the Census tract level from American Fact Finder (published by the Census), at the cost of fewer data points per tract. And then we can apply a machine learning algorithm trained at the county level to the smaller-size tracts.

One caveat to the results below is that I trained the algorithms on the presidential election results, not other Congressional races. Mostly that’s because I was too lazy to download the district-by-district demographic data, but some time in the future I’ll probably get around to that, perhaps getting somewhat more accurate results. But I don’t expect much to change- our politics these days is so partisan that I don’t expect many Trump voters to vote for Phoenix, and vice versa.

The only other potentially tricky aspect of this analysis was to make sure that none of the demographic data points I used are tied to an absolute scale- making sure to train on population density as opposed to total population, for instance. But using the nine parameters listed at the end of this post, the models I tried (an elastic regression and a neural network) seemed to both converge to a fairly stable solution. First, here’s the results from the elastic regression (which is essentially a simple linear regression):

Figure 1: An elastic regression algorithm trained on the nationwide county-level voting results of the 2016 presidential election, used to predict the vote of California's 25th Congressional district. Click to download the image.
Figure 1: An elastic regression algorithm trained on the nationwide county-level voting results of the 2016 presidential election, used to predict the vote of California's 25th Congressional district. Click to download the image.


And the results from a neural network:

Figure 2: Results from a neural network trained on the nationwide county-level voting results of the 2016 presidential election, used to predict the vote of California's 25th Congressional district. Click to download the image.
Figure 2: Results from a neural network trained on the nationwide county-level voting results of the 2016 presidential election, used to predict the vote of California's 25th Congressional district. Click to download the image.


Pretty consistent if you ask me!

What’s next?

My suggestion to Jess Phoenix is to spend her time working to improve turnout in the blue areas of the map above- the 2016 election saw only around 52% turnout for the Congressional race by my calculation. And it strikes me as far more difficult to convince voters to switch their allegiance than to just encourage more of your own base to turn out to vote.

Furthermore, strongly Democratic places generally have trouble getting strong turnout because young people vote at a much lower rate than older voters. Just take a look at this bubble plot, showing the results from the 2016 Presidential election:

Figure 3: Voter turnout rates versus median age in the 2016 Presidential election. Each bubble is a county, whose size represents its population and color represents its vote. Young Democratic counties clearly have some trouble getting a strong turnout. Click to download the image.
Figure 3: Voter turnout rates versus median age in the 2016 Presidential election.
Each bubble is a county, whose size represents its population and color represents its vote. Young Democratic counties clearly have some trouble getting a strong turnout.
Click to download the image.


So if I were Phoenix, I would first update my logo (yikes!), but then I would head to the blue areas in Figure 1 and spend my time registering young people to vote. Good luck Jess!

Data Points used here

  1. Population Density
  2. Percent White
  3. Percent Black
  4. Percent Hispanic
  5. Percent Asian
  6. Percent Native American
  7. Percent Youths
  8. Median Age
  9. Homeownership Rate