As data warehouse vendors increasingly move to support in-database analytics, SAS Institute Inc. is looking like the belle of the ball – sought after by multiple dance partners wanting to pull its data mining and predictive analysis applications into their databases.
SAS and Teradata Corp. have been working together for the past two years to integrate SAS applications into Teradata's database. But SAS's in-database analytics ambitions extend beyond Teradata: In the past two weeks, the analytics vendor has divulged similar partnerships with Aster Data Systems Inc. and Netezza Corp.
The integration with the database vendors aims to reduce the need for SAS users to extract data from data warehouses and load it into separate systems for analysis – a goal that both users and analysts say is a laudable one.
"Inherently, it makes sense to do a lot of [analytic] operations closer to your data than to ship large volumes of it around the enterprise," said Merv Adrian, an analyst at IT Market Strategy in Pleasanton, Calif.
Some forms of in-database analytics have been done for years, going back to the advent of functions such as stored procedures and triggers, Adrian said. But having the likes of SAS team up with multiple data warehouse vendors is "a big mainstream win" for the in-database approach, he added.
James Kobielus, an analyst at Forrester Research Inc., said that up to 70% of the time on data mining projects can be spent on preparing data for analysis, "just to get it into a usable state so you can begin building the data models." Being able to run the analytics inside a data warehouse, without having to extract information and move it to a specialized server, should result in "a big productivity benefit" for users, he said.
As a result of the integration work done thus far by SAS and Teradata, several of the analytic procedures in the latest SAS 9.2 release can run within Teradata's software. And at Teradata's annual user group conference in Washington last month, the two companies announced a Business Insight Advantage program that lets SAS users utilize a set of Teradata features.
SAS also plans to enable its analytics functionality to run inside Aster Data's massively parallel database, SAS executives told attendees at the company's M2009 data mining conference in Las Vegas last week. San Carlos, Calif.-based Aster is releasing a new version of its software that lets analytic applications be fully embedded within the database, using "resource containers" to separate data management tasks and analytic processing.
At the TDWI World Conference in Orlando this week, Ken Hausman, product marketing manager for data integration at SAS, declined to comment about Aster; according to Aster executives, more details about the work that the two companies are doing together are expected to be disclosed later this month.
Also at the TDWI conference, officials from data warehouse appliance maker Netezza said they're cultivating a relationship with SAS to push some of its analysis tools into the Marlboro, Mass.-based company's devices. That will augment a new offering dubbed Netezza Extreme that is due to be launched early next year. With that rollout, Netezza plans to add a full implementation of the open-source MapReduce programming technology as well as support for running open-source and user-built analytic algorithms in its appliances.
Hausman did confirm that SAS is working with Netezza. For example, it is enabling the data-scoring accelerator included in its Enterprise Miner software to be integrated into Netezza's database engine. And Teradata was only the first partner in an ongoing initiative within SAS to support in-database analytics, he said.
"We're moving down the road of having more database vendors have those capabilities," Hausman said. "We think it's where the industry is going to go. It can provide not only a tremendous amount of processing efficiency by being able to leave data where it is, but also there are going to be cost advantages [for users]."
The cooperation between SAS and Netezza is a welcome development for Jay Kent, a senior solution architect at AstraZeneca PLC's U.S. information services unit in Wilmington, Del. The pharmaceuticals maker expects to move its sales data off an Oracle database and onto a 25 TB Netezza appliance by next April, Kent said at the TDWI conference.
AstraZeneca also has a SAS group that stores monthly extracts of the data on a separate server for trending and predictive analysis uses. While there are no current plans to change that process, Kent said that being able to do SAS analysis within the Netezza appliance could help ensure that all users are working off a consistent set of data.
"We have business rules in our data warehouse, but the SAS group has [its] own business rules," he noted. "If we get [analytics capabilities] in the database, then we can guarantee that they're seeing the same data the enterprise wants everybody to see."
The in-database approach could also allow AstraZeneca's SAS users faster access to data, according to Kent. Currently, AstraZeneca loads monthly sales information into the Oracle database on a weekend, then does the data extract for the SAS group a day or two later, he said.
An IT architect at a large insurance company that uses SAS said he also sees potential processing and data management benefits from being able to run the analytic applications within a database. "When you take data out of the data warehouse, you lose control of it," said the architect, who asked not to be identified. "It's just a SAS model sitting in somebody's SAS folder. It's not in a database under [IT's] management."
Yet SAS shouldn't expect to have the application side of the in-database analytics market to itself.
Adrian expects IBM to eventually move to incorporate into its DB2 database the data mining technology that it bought when it acquired SPSS Inc. last month. From a competitive standpoint, he said, "SAS is kind of beating its chest a little bit here" with its multipronged initiative, in an attempt to reaffirm its leadership in the data mining market and fire a pre-emptive shot across IBM's bow.
Aster Data announced Version 4.0 of its database on Monday, saying that users can take analytic applications written in Java, C and other programming languages and run them inside the database server. The application logic and data management functions are encapsulated into separate containers so they don't compete with one another for system resources, said Sharmila Shahani-Mulligan, Aster's executive vice president of worldwide marketing.
Existing analytic apps don't have to be rewritten to run inside the database engine, Shahani-Mulligan said. But users do have to add support for MapReduce to applications in order to take full advantage of Aster's massively parallel processing capabilities. Aster is also working on potential in-database analytics partnerships with packaged applications vendors other than SAS, she said.