Guido Vrola - Fotolia
Before the 2012 presidential election, data science guru Nate Silver famously used an analytical model to correctly predict the winner in all 50 states. His model wasn't so prescient last November: On the morning of Election Day, it gave Hillary Clinton a 71.4% chance of winning the presidency -- a probability it had increased by six percentage points over the previous 48 hours. As things turned out, the model made the wrong call in each of the five battleground states that swung the 2016 election to Donald Trump.
But Silver almost, kind of, sort of predicted exactly what happened: a popular vote win and electoral college loss for Clinton. In several blog posts about the model's forecast, including a final one published early on Election Day, he outlined that scenario as a distinct possibility, pointing to factors such as the potential for polling errors to erase Clinton's thin leads in key states. In the end, though, the founder and editor in chief of the analytics website FiveThirtyEight stuck with his data-driven approach and wrote that Clinton "is probably going to win, and she could win by a big margin."
This isn't meant to be a critique of Nate Silver and his analytics methods. Yes, he did get it wrong, like just about everyone else who tried to forecast the election. But he came the closest to getting it right among the data scientists making predictions based on what advanced analytics algorithms were telling them. Clearly, he saw something in the data that gave him second thoughts. It appears to be a case in which human intuition nearly trumped (sorry) the ultimately faulty output of an analytical model. And there are lessons to be learned from that for analytics managers and data scientists in the corporate world.
The downsides of analytics data
Data is a wonderful thing, but it isn't infallible. Data sets, especially ones pulled together from different source systems, are bound to include inconsistencies and errors that can send analytics efforts awry if they aren't identified and fixed beforehand. "Noisy" data hides valid information among spurious stuff that can skew analytical results in unsavory ways. Relevant data may never find its way into a predictive model in the first place, leaving data analysts to work with incomplete info.
This issue is becoming a bigger one for analytics teams as more organizations deploy big data systems and predictive analytics tools, including machine learning and artificial intelligence software. In an ongoing survey being conducted by SearchBusinessAnalytics publisher TechTarget, predictive analytics ranks first among business intelligence (BI) and analytics technologies for planned purchases. As of mid-January, 39.6% of the 7,000-plus IT, analytics and business professionals who responded to the survey said their organizations were looking to invest in predictive analytics over the next 12 months.
Don't get me wrong: A data-driven approach is likely to help improve decision-making in companies and to arm business executives, marketing managers and other end users with information they can use to boost business performance. I've talked to IT, BI and analytics managers who are still struggling to get the execs in their organizations to use data analysis tools. Managing business operations based on accumulated knowledge, previous experience and gut feel definitely isn't infallible, either.
Get involved in analytics applications
However, human judgment, on the part of both data analysts and the business users they serve, shouldn't be tossed aside completely. That starts with the data preparation process -- data scientists and data engineers must make sure that they're working with the right data and that it's properly structured and organized for the intended analytics applications. It continues as predictive models are built and then tested -- or "trained" -- to try to ensure that they'll produce valid results. And it certainly applies as data scientists review those results and assess their accuracy, and as the findings are shared with end users.
Auto insurer Progressive is one company that leans heavily on a data-driven approach in its business decision-making processes. "We want people to have intuition and ideas, but they need to prove them out with data," said Pawan Divakarla, Progressive's data and analytics business leader.
At the same time, though, the company's data scientists spend a lot of time cleaning up data for analysis and then evaluating the accuracy of algorithms and how to improve them. It's crucial that the information generated by the algorithms can be safely relied on in proving out all those intuitions and ideas, Divakarla said.
That mixture of data and the human element is the best recipe for analytics success. In a 2015 conference keynote speech on managing data-driven applications, Nate Silver himself suggested letting data guide the first 80% of the analytics process and then handing over the reins to data analysts and business users so they can give analyses a reality check.
Data can tell you a lot of things -- even the likely winner of a presidential election. But if what it's saying doesn't seem entirely right to you, it's a good idea to listen to those alarm bells going off in your head.
Experienced IT and analytics pros share tips on creating a data-driven organization
Hadoop-based big data systems push wider adoption of data-driven analytics
How to navigate the fine line between being data-driven and being data-obsessed