freshidea - Fotolia
The work that goes into building predictive analytics models doesn't stop once the code is written. To ensure that predictive modeling efforts deliver accurate and meaningful results, analytics managers say it's important to continue testing analytical models and reworking them until they're fully reliable -- and to avoid having individual data analysts working on models in isolation.
"We beat up our hypothesis and results," said Eric Haller, executive vice president of Experian DataLabs. "In the end, we want to get the best answer we can."
DataLabs is a business unit within Experian PLC, which is known mostly for its credit rating service. Haller's group develops and runs data analytics applications for the other arms of the company as well as outside clients. The teams of data scientists and predictive modelers in the DataLabs operation generally work on credit risk modeling, targeted marketing campaigns and Web data analytics. And Haller said that unlike places where he has worked in the past, his group takes a very team-centric approach to tackling those problems.
For example, instead of seeking to minimize the number of people involved in a project in order to reduce overhead or speed up development, DataLabs gets as many analysts involved as possible. The idea is that, unlike some other business situations, developing predictive models benefits from a variety of perspectives. One analyst might develop a model, but another might question why certain variables were included or excluded, Haller said, adding that every analyst has to be able to defend the decisions they make during the development process.
"When the goal is building the best model you can, you're better off getting the most smart minds in the room as possible," Haller said.
Size limit on predictive modeling teams
Eric Hallerexecutive vice president, Experian DataLabs
That can be easier said than done. Haller acknowledged that there's a huge talent war underway between companies over the hiring of data scientists. These workers don't come cheaply, if they can be found at all, so just getting multiple smart minds into a room can often be a challenge.
But Haller said some of the same things that foster a good environment for building models -- collaboration, intellectual challenges, giving analysts the ability to work on several different types of projects -- tend to be the kinds of things data scientists look for in a working environment. As a result, facilitating strong conditions for development of predictive analytics models can also tackle some recruitment and retention challenges.
The data analysts at New York-based online media company Upworthy take a similar team approach to ensuring the quality and reliability of their analytical models. Speaking at the 2015 Big Data Innovation Summit in Boston, Daniel Mintz, Upworthy's head of data and analytics, said he makes analysts verbally describe to him a real-world scenario that would explain the findings of their models before anything goes into production. He said that forces the analysts to think about the behavior of media consumers, rather than simply putting their heads down and cranking out algorithms. It also can help the analytics team catch potential mistakes before anyone in the company acts on bad information.
Questioning predictive analytics models
For example, Mintz's team recently analyzed some data to see how page load time affects reader engagement, a metric that Upworthy measures based on the amount of time spent on a page, scrolling and other factors. They hypothesized that when pages take a long time to load, people will get frustrated and leave the site before reading the article they had clicked on. What they found, however, was that long page load times actually correlate with high engagement.
The team was ready to scrap the predictive analytics model they had used, assuming there was something wrong with it or the data it had generated. But when they sat down to verbalize how the results could be explained, they realized that people who stick around while a page takes a long time to load probably really care about the topic, which is why they engage with the content more than in other scenarios. The model turned out to be a useful metric for predicting how likely a reader is to engage with content.
"If you have a [mental] model of how things actually work in the world, you'll see when things don't add up," Mintz said.
Big data isn't necessarily important for developing predictive analytics models
Get the business engaged when building new analytical models
Move quickly to get the most out of predictive modeling efforts