ORLANDO, Fla. -- SAS Institute Inc. is throwing its hat into the appliance ring, and its goal is to go head-to-head with the big data challenge.
At its Analytics 2011 Conference Series, SAS announced plans to tackle the reams of data growing in velocity, variety and volume -- or “big data” -- with a high-performance computing platform. The appliance, which packages SAS analytics software together with hardware from its partners Teradata and EMC Greenplum, combines in-memory analytics with in-database technology. The result is a co-location model enabling users to perform more complex analytics, said Oliver Schabenberger, senior research statistician for SAS, during the conference’s opening keynote address.
“The No. 1 performance killer still is our lack of ability to move the data around,” Schabenberger said. “So the No. 1 strategy for us in high-performance computing is co-location: Bring analytics and data together.”
While the idea of joining the two together isn’t a new one, the approach is. SAS is banking on software that takes advantage of the resources already offered by computing platforms, Schabenberger said. Specifically, SAS is drawing on commodity blades, or tiny computers that, when added together, can provide terabytes of memory and storage as well as multi-threading -- the ability to execute different parts of a program at the same time.
“Instead of involving the disk, why do I not just lift the data up into all this memory, leave it there and analyze in-memory,” Schabenberger said.
SAS is not alone in moving toward in-memory analytics. SAP has been pushing HANA, its own in-memory analytics appliance and at OpenWorld this year, Oracle announced Exalytics, its in-memory analytics appliance based on its TimesTen database.
However, in-memory analytics was only part of the answer: SAS still needed a way to efficiently manage the data and to do so in parallel. Enter EMC Greenplum and Teradata, data management vendors that offer a blade-based and massively parallel processing (MPP) environment and have agreed to partner with SAS in building the high-performance computing analytics appliances.
“In our new model, we’re running side-by-side with the database process,” said Schabenberger, drawing a distinction from in-database analytics alone. “There’s a SAS process and a database process. They like each other and they talk to each other. But the SAS process is not limited by the abilities the database can provide to a computation.”
The announcement left some conference attendees with questions.
“I like it quite a bit,” David Cedillo, a finance manager for the Irving, Texas-based TXU Energy, said after a demonstration of the product. “My question was, What databases does it work with?”
Cedillo said TXU Energy is interested in investing in a high-performance program, but without having either of the two vendors SAS has chosen to partner with, the new appliance wouldn’t be something to invest in right now. Instead, the company is considering Oracle Exadata.
“I hope they work on releases beyond EMC Greenplum and Teradata,” he said.
SAS promised customers who invest in the near-real-time appliance no interruption from the software they’re used to. The offering will be available in December.
“Your interaction with the software will not change,” Schabenberger said, adding that SAS wants the appliance to meet business needs for the breadth of analytics from hindsight to foresight. “The difference is performance and speed.”
What does this mean for SAS?
The announcement may not come as a complete surprise to SAS customers. At a conference last year, Jim Goodnight, CEO of SAS, mentioned the technology in an address. But what does it mean for SAS?
“It’s a good announcement,” said James Kobielus, senior analyst for the Cambridge, Mass.-based Forrester Research Inc. “It essentially puts SAS squarely into the advanced analytics solution appliance market with its own (partner-reliant) offerings.”
Yet for businesses interested in entering the advanced analytics arena, the appliance model isn’t the only one out there, Kobielus said. There are cloud, Software as a Service (SaaS) and traditional software approaches as well.
"More vendors in the advanced analytics market are supporting all three go-to-market solution delivery approaches,” he said. “The trend in the past several years has been toward fully integrated hardware-software appliances.”
The cloud or SaaS approach, he added, is coming on strong.
“The appliances offer quick time-to-deployment value, high performance and low cost,” he said. “The cloud or SaaS model has the potential to outshine appliances in those areas in analytics, but that trend will take several years to play out.”
No need for samples
One potential attraction for the appliance approach is its ability to scale back on sampling, a technique used to find a smaller, honest representation of the whole data set, typically when performing predictive analytics.
As data sets balloon -- in volume as well as in width and length -- and the number of variables on which to test the data grow, sampling can sometimes produce less than desired results.
“If the size of your data is choking your analytics, the problem is not that you have too much data,” Schabenberger said. “It’s that you have the wrong analytics environment.”
For businesses, the inability to sample data sets properly could mean missing out on the opportunity to take advantage of potential undiscovered patterns at the granular level -- the transaction level in retail or the patient level in health care, for example.
But making the case for moving away from sampling could be a tough sell, especially among the analysts who use the tool.
“Show me the benefit of using the entire data set versus sampling,” said Sylvain Lanthier, a senior analyst at Ontario-based Canadian Tire Financial Services. “Sampling should be representative of the whole population.”
Lanthier said he wouldn't form an opinion on the product as a whole until he had more information. “I’ve never seen it in action,” he said. “Theoretically, it sounds great.”