Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
In the ongoing effort by companies to tease tangible business value out of agglomerations of big data, in-memory analytics tools offer a possible path to unlocking insights that can spark operational improvements and point the way to new revenue opportunities.
Unlike conventional business intelligence (BI) software that runs queries against data stored on server hard drives, in-memory technology queries information loaded into RAM, which can significantly accelerate analytical performance by reducing or even eliminating disk I/O bottlenecks. Consultants and experienced users say the resulting speed boost is particularly compelling for big data analytics applications involving complex what-if scenarios and large amounts of information from a variety of data sources.
"The biggest benefit to in-memory analytics is speed of analysis and exploration," said Cindi Howson, founder of BI Scorecard, a research and consulting company in Sparta, N.J., that publishes technical evaluations of BI and analytics tools. The data latency that often bogs down traditional BI querying "interrupts the whole thought process" for business users, Howson said. She cited analytical flexibility as another in-memory analytics plus: "With in-memory tools, users can ask business questions they could never ask before because the technology was too slow."
That's the case at Cheezburger Inc. The Seattle-based operator of humor websites that attract a total of 500 million page views a month is getting good results from an in-memory big data analytics initiative, according to Loren Bast, who was Cheezburger's director of BI until he left the company in April.
Deep dive into too much data
Initially, Cheezburger stumbled in trying to track and analyze data about to its online traffic in an effort to discern user behavior patterns. "We jumped into the deep end of the pool with big data, and we were certainly doing it big, just not doing it right," Bast said while he was still at the company. Only 10% of the data it was capturing ended up being relevant to the analytics program and clean enough to be trustworthy, he added.
We jumped into the deep end of the pool with big data, and we were certainly doing it big, just not doing it right.
former director of BI, Cheezburger Inc.
The BI team regrouped, turning to Qlik Technologies Inc.'s QlikView in-memory analytics software for use against specific data sets stored in Hadoop and other repositories. Bast said the in-memory system has given Cheezburger's business users far more flexibility for creating queries on the fly and joining together information from disparate data sources to get answers to their business questions.
"Without in-memory, it was really tedious to build reports, especially dynamic, very customized reports," he said. "Now we can solve reporting needs much faster than we used to." That enables users "to get away from the drudge work" and spend more time acting on the traffic data than analyzing it, Bast added.
In-memory analytics tools might make it easier for organizations to capitalize on growing volumes of big data, but that doesn't mean the combination comes without challenges. The relatively high cost of RAM compared with disk storage has been a barrier to adoption, as have scalability issues related to the memory constraints of servers, Howson said. Those concerns have been somewhat alleviated by falling memory prices and the growing availability of 64-bit systems supporting significantly expanded memory capacities -- but they linger.
Good governance needed on in-memory analytics
In addition, data governance is an issue that organizations will need to address as more and more business users get access to in-memory applications, said Tapan Patel, global product marketing manager for predictive analytics and data mining at software vendor SAS Institute Inc. "You have to avoid a scenario where multiple data silos appear," he said. "Closer integration of in-memory analytics tools with the traditional data layer is going to be critical to avoiding data replication."
Seamless connectivity with Hadoop -- the open source technology that has become nearly synonymous with big data because of its ability to cost-effectively store massive amounts of structured and unstructured data -- is one of the critical integration points for enabling analysis of big data in memory. "In-memory analytics and Hadoop are very complementary technologies, and in most cases they will both have a place [in big data environments]," said John Appleby, head of consulting on deployments of SAP AG's HANA in-memory computing appliance at Bluefin Solutions, a London-based consultancy and systems integrator.
But the linkages between Hadoop systems and in-memory tools are still relatively immature, according to Appleby. He said Hadoop's flexibility for handling unstructured data in a schema-less fashion lies in direct contrast to in-memory software's need to have some level of structure for analyzing data. "The types of [data] models created in the two worlds don't look the same," Appleby said. "You have two different foundations in which you need a single view, and no one really has the answer yet. This is a problem organizations are only just starting to deal with."
That isn't stopping Cheezburger. Bast said the company is using QlikView in conjunction with Hadoop to determine what data to look at and then to analyze the information in an effort to improve content planning and detect anomalies that might point to technical or promotional problems -- for example, a piece of content that has a large number of comments but doesn't get a lot of traffic. The result, he added, is less waiting around for queries to run their course: "It's made our decisions much faster."
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for more than 25 years.