Projections Before the Polls Close: What Could Possibly Go Wrong?

Supported by

The 2016 Race

Projections Before the Polls Close: What Could Possibly Go Wrong?

Photo
Voting in Ohio during the 2012 presidential election. Low turnout early in the day among demographic groups likelier to vote for President Obama made his campaign team think he was going to lose the state (he didn’t). Credit Michael F. McElroy for The New York Times

In the history of the Obama campaign’s storied analytics operation, the effort to model the results live on Election Day, before the votes were tallied, was undoubtedly a low point.

“That was the worst 12 hours of my life,” said David Shor, a senior data scientist at Civis Analytics who was in the “Cave” — the Obama analytics boiler room — on Election Day of 2012. By late morning, some in the Obama team concluded that President Obama was losing Ohio.

This year, Election Day could be the worst 12 hours for all of us.

In a first, rather than wait for election results to be tallied at county courthouses and to be announced by The Associated Press or the TV networks, a company called VoteCastr will project the results in real time. The results will be published on Slate and Vice.

It might make you want to throw up.

A lot can go wrong, as it did for the Obama team.

In 2012, the Obama campaign brought in top talent from Google and Catalist, a Democratic data firm, to estimate the results of the election in real time. The early results did not look good for Mr. Obama. At first, the Obama team had dismissed the data pointing toward a low turnout among young, nonwhite and Democratic-leaning voters.

Continue reading the main story

In the telling of Yair Ghitza, the chief survey scientist at Catalist, who designed the Obama campaign’s Election Day modeling, senior analysts had concluded that the trends could be real by late morning.

Elan Kriegel, now the analytics director of the Clinton campaign, left for the bathroom to throw up.

This story is not a secret. It was reported in Jonathan Alter’s book about the 2012 campaign, “The Center Holds.” It’s also described in Mr. Ghitza’s dissertation.

But it’s largely unknown to the public, which has little experience with the stability or accuracy of these models. People might expect the “uncannily accurate” estimates that Sasha Issenberg, a VoteCastr partner, promised in a September article.

In a telephone interview, Mr. Issenberg described the history of these efforts somewhat differently, saying he’s heard “horror stories” about Election Day efforts to model the results.

For readers unaccustomed to live Election Day forecasting, the VoteCastr effort could be a horror story as well. This is not because the VoteCastr effort is unserious or doomed to fail. It takes many of the steps needed to do the job well, or at least as well as it can be done.

OPEN Interactive Feature

Interactive Feature: 2016 Election Forecast: Who Will Be President?

It has teamed with HaystaqDNA, a Democratic analytics firm led by Ken Strasma, the Obama campaign’s targeting director in 2008. HaystaqDNA is conducting large surveys of 10,000 respondents per state to power statistical models that estimate the vote preference of every voter in a state.

The VoteCastr team will monitor 100 precincts — a good number — in each battleground state, periodically reporting on the turnout in real time. The data will be used to infer whether turnout is higher or lower than expected in certain areas or among certain groups.

But even a serious effort like this one — let alone the Obama campaign’s effort in 2012 — faces big challenges.

One obstacle is that turnout varies over time: Younger voters don’t usually vote in the morning, and many voters in nine-to-five jobs might surge to the polls in the evening.

This was one of the big challenges for the Obama campaign in 2012. By 10:30 a.m., its model had concluded that young and nonwhite voters weren’t showing up in Ohio. These trends worked themselves out by the end of the day, but not before causing considerable consternation in the Cave.

The VoteCastr model makes no effort to adjust for this. It will treat turnout as if it’s uniform throughout the day: If 10 percent of the day has passed, it will expect 10 percent of the vote to be counted. This can cause considerable variance in the estimates as the hours go by.

It’s also hard to infer what shifts in turnout by precinct mean for certain groups. If the turnout in a well-educated precinct is down 5 percent, does that mean that the turnout among well-educated voters, who tend to support Hillary Clinton, is down? Or does it mean that well-educated Republican turnout is down?

First Draft

Political news and analysis from the staff of The New York Times.

Sign-up for free NYT Newsletters

Morning BriefingSubscribed
News to start your day, weekdays
Opinion TodaySubscribed
Thought-provoking commentary, weekdays
CookingSubscribed
Delicious recipes and more, 5 times a week
Race/RelatedSubscribed
A provocative exploration of race, biweekly

Please verify you're not a robot by clicking the box.

Invalid email address. Please re-enter.

You must select a newsletter to subscribe to.

Sign Up
Receive occasional updates and special offers for The New York Times's products and services.

Thank you for subscribing.

An error has occurred. Please try again later.

You are already subscribed to this email.

View all New York Times newsletters.

The VoteCastr model’s approach is defensible, if ham-handed: If it believes that 500 voters in a precinct will vote, it will assume that the 500 likeliest voters have turned out. There’s a danger here: Some of the less likely voters will indeed show up, and they tend to lean Democratic.

Another challenge is estimating vote preferences in the first place. The turnout by precinct doesn’t say much about how people voted — just who voted. The estimates for how people voted come from polling data, and the models deduced from it.

Many of the problems facing polls — like the possibility that undecided voters or the supporters of minor-party candidates will break one way — apply to the models as well.

OPEN Interactive Feature

Interactive Feature: 2016 Senate Election Forecast

It’s a challenge we’ve faced in our own North Carolina early-vote tracker, which is based on a poll showing Mrs. Clinton up seven points in the state. We know exactly who voted, but ultimately we’re stuck with a pretty favorable sample for Mrs. Clinton.

Readers have one advantage in the case of our early-voting tracker: You know that our North Carolina poll was very positive for Mrs. Clinton.

In the case of the VoteCastr effort, readers won’t have any idea whether its estimates were strong or weak for Mrs. Clinton heading into the election. It’ll be difficult for readers to untangle the real news — whether Election Day turnout has deviated from expectations — from the other factors that drive VoteCastr’s estimates.

Election night forecasting, based on actual results, is a different challenge. Forecasters don’t have these issues (though they have their own). The Upshot will be forecasting the results based on actual returns once there’s a sufficient amount of data.

The VoteCastr team will supply Slate with some of the data necessary to help untangle it. For instance, readers will know whether turnout is up or down in Democratic or Republican precincts. But it will probably be hard for readers to make these inferences.

Julia Turner, Slate’s editor in chief, said the goal was “to make it impossible to separate numbers from the context.” There will be contextual language embedded in the graphics and in social media. The Slate team members will de-emphasize the horse race number. Josh Voorhees, a Slate political writer, will provide commentary throughout the day.

On Saturday, Mr. Voorhees wrote an article explaining the VoteCastr effort, including many of the potential sources of uncertainty — like the potential errors in polling. He describes it, appropriately, as an experiment.

Parts of the experiment have the potential to be extremely valuable and to improve the future coverage of elections. The estimates for the early vote in states like Colorado, North Carolina, Florida and Nevada — where individual-level data on turnout is available — will be informative and more useful than many pre-election polls. By the end of the evening, the estimates of turnout by precinct should be a valuable complement to deeply imperfect exit-poll-based estimates of the composition of the electorate.

But the live estimates and projections could be a wild ride. In 2012, the Obama team had the top analysts in politics, and plenty of previous experience on Election Day. Its model had formal measures of uncertainty. Yet even it found itself wondering whether it would lose Ohio.

On Tuesday, readers will be exposed to live results, for the first time, with little understanding of the amount of uncertainty. There will be no “margin of error,” or other indicator of that uncertainty. The results will most likely vary throughout the day.

I’ll bet it sends someone, somewhere, rushing to the bathroom.

The Upshot provides news, analysis and graphics about politics, policy and everyday life. Follow us on Facebook and Twitter. Sign up for our newsletter.

A version of this article appears in print on November 8, 2016, on page P3 of the New York edition with the headline: Real-Time Projections, Valuable and Risky. Order Reprints| Today's Paper|Subscribe

Continue reading the main story


SHARE THIS
Previous Post
Next Post