Building predictive models has come to seem like one of the more glamorous jobs in business intelligence and analytics....
Successful, high-profile predictive modelers like Nate Silver, founder of the ESPN blog FiveThirtyEight, and Rayid Ghani, chief data scientist for President Obama's 2012 re-election campaign, can achieve rock star status. But looking beneath the surface, it becomes clear that most of the actual work involved in successful predictive analytics projects isn't so glamorous.
"Everyone wants to build a model or do the business sell job" to help drive organizational decision making, said Jens Meyer, managing director of credit risk, data and portfolio management at The First Marblehead Corp., a student loan provider based in Medford, Mass. The problem is that most people don't think enough about the process of deploying and maintaining analytical models, Meyer added during a presentation at TDWI's The Analytics Experience conference in Boston last week.
For Meyer, building predictive models is only a small portion of the work involved. It's often said that about 80% of analytics is collecting and cleaning data. But Meyer said even that ignores the substantial amount of work required after the data has been prepared and a model built. In order for the model to have a meaningful business impact on an ongoing basis, it needs constant tending after being put into production.
Meyer noted that predictive models don't stay relevant forever. In fact, they often have very short lifespans, at least as originally designed. There are a number of reasons for this. For one thing, specific calculations about groups of customers or other populations naturally "drift" over time, as new data is collected and analyzed, he said. What was average when you first developed a model may soon start to become atypical. But if you aren't constantly testing and verifying the model, you won't catch that -- and its predictive value will plummet.
Outside influences on model performance
External events can also diminish the effectiveness of predictive models, if they aren't modified to match changing conditions. For example, Meyer said the 2009 housing-market crash threw many of his team's models at First Marblehead into disarray because the metrics they were using to assess household wealth changed abruptly for many people, as home values plunged. His team had to adjust those models to account for the new reality.
For these reasons, Meyer puts an expiration date on his models, meaning they can't be used after a certain date unless they're revalidated. He recommended continuous A/B testing to ensure models in production use are still working as intended, and quickly making fixes if the testing illuminates flaws. "Continuously evaluate the model and see if you can do better," he said. "Your work never stops in that respect."
Businesses should also pay close attention to the makeup of predictive modeling teams and ensure that data scientists don't have unrealistic expectations of the work they'll be doing. Predictive analytics may seem like a highly scientific field, requiring at least a Ph.D., but job candidates who fit that particular bill might not always want to settle for the more mundane task of building predictive models in a business environment, according to another speaker at the TDWI conference.
Clement Brunet, director of research and analytics at Canadian insurance company The Co-operators Ltd., said he generally prefers people with master's degrees over individuals with doctorates. His team is primarily made up of data scientists, business analysts and data visualization engineers. For those roles, higher levels of education can lead people down purely academic analytics paths that often hold relatively little value for a business, Brunet said.
Seeking full predictive modeling value
That approach to hiring is part of a general focus on pulling the most value possible out of each predictive modeling project at The Co-operators, which is based in Guelph, Ontario, and offers a full line of home, auto, life, business and other types of insurance policies.
For example, Brunet talked about a recent project to identify homes that may have structural problems and require in-person inspections prior to the company agreeing to insure them. The process of deciding which homes to inspect before issuing policies used to be somewhat haphazard. But after applying a predictive model to past claims data, Brunet was able to find that homes owned by single men in suburban areas tend to have the most structural problems, and, therefore, should be inspected at higher rates. Based on the amount of issues the inspections found, he estimated that the four-week model-building project saved the insurer about $500,000 in the first year.
It may not be the kind of attention-getting model that turns data scientists into analytics rock stars, but, for Brunet, predictive modeling initiatives that can have a meaningful impact on a company's bottom line are what businesses should focus on.
"Some people think you're going to do rocket science and theoretical stuff, but when you save the business millions with simple stuff, you're building confidence," he said. "The return on investment of those small things is tremendous."
Not everyone thinks big data and predictive modeling are a great match
Speed up predictive model building to gain the most business value
How using predictive models can help reduce customer churn