BACKGROUND IMAGE: Vertigo3d/iStock


Predictive data analytics advances businesses ahead of the game

Manage Learn to apply best practices and optimize your operations.

Don't let a data-driven approach ax judgment from analytics equation

Data analytics can help improve decision-making in organizations. But human intuition and judgment need to be part of the picture to keep predictive models and analytical algorithms from going awry.

Before the 2012 presidential election, data science guru Nate Silver famously used an analytical model to correctly predict the winner in all 50 states. His model wasn't so prescient last November: On the morning of Election Day, it gave Hillary Clinton a 71.4% chance of winning the presidency -- a probability it had increased by six percentage points over the previous 48 hours. As things turned out, the model made the wrong call in each of the five battleground states that swung the 2016 election to Donald Trump.

But Silver almost, kind of, sort of predicted exactly what happened: a popular vote win and electoral college loss for Clinton. In several blog posts about the model's forecast, including a final one published early on Election Day, he outlined that scenario as a distinct possibility, pointing to factors such as the potential for polling errors to erase Clinton's thin leads in key states. In the end, though, the founder and editor in chief of the analytics website FiveThirtyEight stuck with his data-driven approach and wrote that Clinton "is probably going to win, and she could win by a big margin."

This isn't meant to be a critique of Nate Silver and his analytics methods. Yes, he did get it wrong, like just about everyone else who tried to forecast the election. But he came the closest to getting it right among the data scientists making predictions based on what advanced analytics algorithms were telling them. Clearly, he saw something in the data that gave him second thoughts. It appears to be a case in which human intuition nearly trumped (sorry) the ultimately faulty output of an analytical model. And there are lessons to be learned from that for analytics managers and data scientists in the corporate world.

The downsides of analytics data

Data is a wonderful thing, but it isn't infallible. Data sets, especially ones pulled together from different source systems, are bound to include inconsistencies and errors that can send analytics efforts awry if they aren't identified and fixed beforehand. "Noisy" data hides valid information among spurious stuff that can skew analytical results in unsavory ways. Relevant data may never find its way into a predictive model in the first place, leaving data analysts to work with incomplete info.

This issue is becoming a bigger one for analytics teams as more organizations deploy big data systems and predictive analytics tools, including machine learning and artificial intelligence software. In an ongoing survey being conducted by SearchBusinessAnalytics publisher TechTarget, predictive analytics ranks first among business intelligence (BI) and analytics technologies for planned purchases. As of mid-January, 39.6% of the 7,000-plus IT, analytics and business professionals who responded to the survey said their organizations were looking to invest in predictive analytics over the next 12 months.

Don't get me wrong: A data-driven approach is likely to help improve decision-making in companies and to arm business executives, marketing managers and other end users with information they can use to boost business performance. I've talked to IT, BI and analytics managers who are still struggling to get the execs in their organizations to use data analysis tools. Managing business operations based on accumulated knowledge, previous experience and gut feel definitely isn't infallible, either.

Get involved in analytics applications

However, human judgment, on the part of both data analysts and the business users they serve, shouldn't be tossed aside completely. That starts with the data preparation process -- data scientists and data engineers must make sure that they're working with the right data and that it's properly structured and organized for the intended analytics applications. It continues as predictive models are built and then tested -- or "trained" -- to try to ensure that they'll produce valid results. And it certainly applies as data scientists review those results and assess their accuracy, and as the findings are shared with end users.

Auto insurer Progressive is one company that leans heavily on a data-driven approach in its business decision-making processes. "We want people to have intuition and ideas, but they need to prove them out with data," said Pawan Divakarla, Progressive's data and analytics business leader.

At the same time, though, the company's data scientists spend a lot of time cleaning up data for analysis and then evaluating the accuracy of algorithms and how to improve them. It's crucial that the information generated by the algorithms can be safely relied on in proving out all those intuitions and ideas, Divakarla said.

That mixture of data and the human element is the best recipe for analytics success. In a 2015 conference keynote speech on managing data-driven applications, Nate Silver himself suggested letting data guide the first 80% of the analytics process and then handing over the reins to data analysts and business users so they can give analyses a reality check.

Data can tell you a lot of things -- even the likely winner of a presidential election. But if what it's saying doesn't seem entirely right to you, it's a good idea to listen to those alarm bells going off in your head.

Next Steps

Experienced IT and analytics pros share tips on creating a data-driven organization

Hadoop-based big data systems push wider adoption of data-driven analytics

How to navigate the fine line between being data-driven and being data-obsessed

Dig Deeper on Predictive analytics

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

How does your organization combine data-driven business decisions and human judgment?

re: "Like just about everyone else who tried to forecast the election.  But he came the closest to getting it right among the data scientists making predictions"

The above quote is obviously a blatant lie and the real black swan.  Many "data scientists" accurately predicted the election, especially for PA-MI-WI.  If anything, some of them got it wrong in that they also predicted MN for Trump.  I received and reviewed their analysis and dabated with them over the internet.  It was obvious Trump was going to win FL-NC-OH-IA.  The accurate analysis were all over the place on NV-CO-NM.  But that did not matter to the electoral college.  The key thing they factored in was the number of people who took the initiative to change their registration from D to R in the rust belt.  The number of people who took the initiative to do it even after the primary when it made no difference.  The number of people who went out of their way to announce their change of party on social media.  And of course the enthusiasm level, big crowds where the crowd inside the stadium counted by the media was dwarfed by the crowd outside the stadium tweeting that they were outside and couldn't get in.  The old  wrong data scientists had the right information available to them and consciously chose not to use that information due to their own biases, both data scientist biases and political biases. 

Old wrong Data Scientist biases include a bias for the past.  Everybody knew that the Trump election was not like the past.  Yet the Data Scientists insisted on using algorithms from the past in their models that Trump had already proven invalid.

This happens in business all the time. I'll use predicting online availability and performance.  An Fortune 100 has to be up online 365/24/7.  I base my model and predictions on a combination of analysis of the completeness, accuracy, etc of the requirements, design and code and external environmental factors as well as the performance test results in production like environments.  My co-workers, Data Scientists with far more math capability than I have, would look only at the performance test results. 

Then they would throw out the "outliers" as they were taught to do in ivory tower statistics classes.  Well, a Fortune 100 system that has millions of transactions per hour, maybe per minute, and trillions in a year cannot afford a one-in-a million outlier event.  And sure enough the happen. 

But the data scientist needs to also build into his model the things that the business analyst says will never happen.  I evaluated the data design of a mission critical system for an agent based company that would be piloted in Florida.  I risked my job aggressively stating the RDB design would not perform  (repeating columns that would be heavily updated from nulls to data, one column at a time, and quickly run out of free space) not to mention locks on the reads, the heavy drag on logging and many other bad design factors.

With low test volumes it performed just barely adequate.  I insisted on heavy volumes and proved it would not perform.  But the business analyst and data scientists insisted it would never experience those heavy volumes, certainly not in pilot.

But I also followed politics and tried to explain it to the data scientists.  (On this project all were far left and automatically rejected my libertarian views ).  The governor of FL was destroying the insurance market. Most companies would flee FL leaving my company one of the few remaining companies.  Being the big dog, it would get an extremely heavy wave of quotes precisely at the time of the pilot. 

Sure enough, it did not perform.  It cost millions of dollars in CPU to run that inefficient pilot; then tens of millions more when it went US wide; and millions more to totally re-design it. 

The data scientists were biased.  They would not admit their bias even when accurate, logical arguments were presented.  I've seen this repeatedly in big IT shops and Fortune 100 companies.  They get good, accurate, factual, logical advice and refuse to admit reality because they refuse to admit their own bias.

The quote should read "Like just about everyone else in my small Pauline Kael group who tried ..." Pauline Kael famously stated "nobody I knew was going to vote for him"