Sergey Nivens - Fotolia
They say the only thing certain in life is uncertainty, but there are at least some things we can predict with absolute confidence. I know with 100% surety, for example, that when I tell my 4-year-old daughter it's time for bed, she'll say she isn't tired. My certainty of this response allows me to prepare for the resistance with a strategy that involves firm demands; some cajoling; and, ultimately, bribery.
Unfortunately, customer behaviors are much harder to predict, and that uncertainty can mean the difference between the success and failure of new products or services. Our need to control outcomes, in life and business, has placed us smack dab in the middle of today's data-driven world. We now have an arsenal of predictive analytics tools at our disposal to forecast outcomes with greater certainty than ever before. These tools can validate business leaders' gut instincts to invest in a promising new product or dissuade them from launching a product likely to go the way of Microsoft's Zune.
But it's a tricky proposition to decide how much trust to put into the data -- especially when the data contradicts long-held beliefs. Ian Swanson, CEO and co-founder of DataScience Inc., says data analysis is a way to prove out a thesis; it provides the data to back it up, invalidate it or head in an entirely new direction. "Sometimes data science just proves the assumption," Swanson notes. "Then there's the case where we've found a piece of gold, and we need to prove it's worth exploring more. There's value in both scenarios."
Predicting the unpredictable
Now that I've touted the value of predictive analytics tools, I'll caution that overreliance on them is just as damaging. As we explain in this month's cover story of Business Information, the insights are only as good as the data that fed them. Statistician George Box famously wrote, "All models are wrong, but some are useful." And any data scientist will tell you the way to make predictive models useful is by putting business users at the table, along with the engineers who will translate predictive insights into new products.
"There is never going to be a silver bullet with some single software panacea; it will always take expertise in data and in the business," says Kenneth Sanford, a lead analytics architect with Dataiku and an adjunct professor at Boston College.
Indeed, there is no substitute for expert knowledge, but even when the right people are involved with analytics projects, the predictions can be dead wrong -- as we saw in the recent U.S. presidential election. Former Secretary of State Hillary Clinton's campaign reportedly worked with a team of data scientists led by chief data analyst Elan Kriegel, co-founder of BlueLabs, who was involved with the successful 2012 Obama campaign. Clinton's campaign relied on data analysis to inform strategic decisions, including where to work on potential voters and place ads. Those insights were supposed to provide a competitive advantage over presidential candidates who didn't invest in predictive technologies. We all know how that turned out.
A request for comment from Kriegel went unanswered, so we don't know whether predictive models helped or hurt the campaign. But for sure, there were other moving parts that crashed the campaign during the final weeks -- factors that perhaps not even the best predictive models could have helped.
It's all about integrity
That's not to say companies should toss away their predictive analytics projects and return to the old way of doing business. Data scientists assert the problem with Clinton's strategy wasn't the data-driven approach or the predictive model; rather, it was the integrity of the data that fed the algorithms -- a problem that's all too common in predictive modeling, Sanford says. "The funny thing about using machine learning in the election is that, probably, if they used traditional statistics, the predictions would have been more accurate," he adds. The reason, he explains, is that statisticians spend a lot of time figuring out regeneration and the meaning behind the variable, while predictive machine learning cares very little about how data is created. Had analysts spent more time scrutinizing who answers polls, we may not have seen the same kind of "Dewey Defeats Truman" assumptions in 2016 that sprang from rudimentary election polling of 68 years ago.
"It isn't just about getting the machine model and slapping some algorithm on a data set to figure out what will happen," DataScience's Swanson says. "With the election, I agree with experts that the problem was there were voices missing in the data; there were problems with the integrity of the data from a validation standpoint."
Inevitably, when asked, people might say they're in favor of saving parks and the environment, for example, but they may not actually vote that way. It isn't the predictive model that's the core of the problem so much as the questionable quality of the sample. This problem, Swanson explains, goes to some of the common pitfalls of artificial intelligence, predictive analytics and data science projects as a whole -- working with reliable data, choosing the right algorithms and truly knowing the problem that needs to be solved.
As explained in another feature, data integrity is the Achilles' heel of predictive analytics. Get the data prep right, and you're on your way. In that story and elsewhere in this issue, you'll hear from companies that are doing predictive analytics correctly and reaping rewards in the form of new customers, more revenue, better efficiency and stronger insights to help them make smart decisions.
Can predictive analytics improve sales funnel management?
- Infographic: The state of data preparation in 2019 –ComputerWeekly.com
Dig Deeper on Business intelligence strategy
Russian intelligence officers indicted for DNC hack
Machine learning's training data is a security vulnerability
Monsanto CIO ties AI investments to business outcomes
Data lake management, governance a hands-on job for big data teams