In-database predictive analytics helps marketer predict consumer behavior

In-database analytics -- the blending of data warehousing and predictive analytics – helps Catalina Marketing foretell consumer behavior.

Thanks to the blending of advanced analytics and data warehousing technology, it doesn’t take long for Catalina Marketing to predict what you’re likely to buy the next time you visit your local Safeway or Walgreens.

At the root of Catalina’s business are two separate but converging technologies.

The first is data warehousing technology that allows Catalina to integrate, transform and store nearly 800 billion rows of customer data that details the purchase history of around 200 million Americans over the past three years.

The second is predictive analytics technology that resides in the data warehouse and allows Catalina to run its scoring models against the huge data set without having to move it to a separate application.

Commonly called in-database analytics, the convergence of data warehousing and advanced analytics technology continues to pick up pace. The latest iteration of Netezza’s TwinFin data warehouse appliance, released last week at the vendor’s user conference in Boston, includes new extensions to help developers build advanced analytics via technology from SAS Institute as well as with the MapReduce framework and R predictive analytics language, for example.

SAS is also working with data warehouse vendors Teradata and Aster Data on integrating its analytics technology inside the database. And IBM recently acquired advanced analytics specialist SPSS, with which it had already begun offering in-database analytics with its DB2 and WebSphere products.

In-database analytics run scoring models faster
The main benefit of in-database analytics technology for companies like Catalina is eliminating the need to move and transform data from a database to a separate analytics application, saving valuable time and effort, said Phil Francisco, vice president of marketing at Netezza.

Eric Williams, Catalina’s chief information officer, agreed, noting that in-database analytics has reduced the length of time it takes the company to run its predictive scoring models by 30% to 40%. That means Catalina can run its scoring models against its vast amounts of data more often, he said, improving their accuracy and its client’s bottom line.

“To do that, I need to spin data at speeds people don’t even think about,” Williams said. With in-database technology from Netezza and SAS, Catalina is able to run scoring models for each of its clients over 600 times a year, he said.

“Moving, in this case, well over 50 terabytes of information out into a data mining piece of technology, running these scores and things, and then moving it back in [to the data warehouse] could take weeks, and in fact it did in the early days,” Williams said. “We now can do it in minutes because we’ve moved the technology next to the database.”

Predictive analytics tools let companies target the right customers
The St. Petersburg, Fla.-based company collects point-of-sale (POS) data at more than 25,000 grocery stores, pharmacies and other retail outlets for its consumer products and pharmaceutical customers. It assigns an ID number to each consumer and tracks his buying patterns.

From there, the data is aggregated and fed into a data warehouse, where Catalina applies scoring models to the data to identify trends in buying behavior. The models help Catalina determine what consumers are likely to buy the next time they go to the grocery or pharmacy, insights the company passes on to its customers like Pfizer and Procter & Gamble.

So, after customers finish up at the check-out lane, they are presented with a personalized coupon for a free carton of milk or recommendations for over-the-counter pain relievers based on their previous buying behavior, Williams explained.

“So if you’re taking a statin for high blood pressure, it could be an opportunity to understand more about blood pressure cuffs that are available in the store,” he said. “It could be for an over-the-counter product, it could be for other services and products that are going to help you with … taking your medication, side-effect alleviation, those types of things.”

Being able to predict what a person is likely to buy next and being the first vendor to make contact with an offer for that product is critical for consumer products companies and drug makers, Williams said. Based on historical data, consumers that like a new product stick with it for around 18 months on average, he said.

“If we can predict somebody who’d be interested in a product, give them the opportunity through a promotion that’s free -- here’s a free one, go pick it out of the Stop & Shop store and try it. If you like it, guess what? We just signed up a new customer for one of our clients,” Williams said.

Is real-time reporting in the near future for predictive, in-database analytics?
The marriage of data warehousing and analytics even holds the potential to run predictive analytics against large amounts of data in near real time.

At Catalina, for example, scoring models are run against historical data and then applied to consumers who meet certain requirements at the time of checkout. Williams envisions a day when the models are run in near real time against data from the consumer’s just completed POS transaction.

“That is where we’re going,” Williams said. “It is our belief that we can actually get to real-time scoring based upon data in that transaction and the historical [data]in such a speed that we can respond within two seconds of the transaction, which is how fast we need to be.”

Dig Deeper on Predictive analytics