Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
Doing in-memory analytics on pools of big data isn't just a load-and-go process. In-memory big data deployments raise a raft of IT architecture issues for organizations to sort through before getting started, including system design, scalability and still-evolving data integration requirements.
Mapping out the proper hardware infrastructure is one of the first considerations. To support in-memory analytics tools, companies must invest in robust, memory-intensive servers. They also must decide on the best approach for scaling the systems because analytics needs and big data volumes expand, said John Myers, a business intelligence (BI) and data warehousing analyst at research and consulting company Enterprise Management Associates Inc. in Boulder, Colo.
The future is all about doing data analytics -- it's not about the software.
"One of the biggest architectural decisions when it comes to hardware is whether to scale up into a single piece of big data iron or scale out across multiple machines," Myers said. Deploying one large server "means less care and feeding" on the part of systems administrators, according to Myers. But his usual recommendation is a scale-out approach -- for example, a cluster of commodity servers. "If you scale up into one big box and that box fails, you're done," he said. "By scaling out, your points of failure are distributed across multiple nodes."
Locating high-capacity servers stocked with the maximum amount of memory as close as possible to where business users are working can also aid in reducing latency on in-memory analytics applications, said Jeff Boehm, vice president of global marketing at BI and analytics software vendor Qlik Technologies Inc. in Radnor, Pa.
Data persistence pays off?
Another factor to consider, Boehm said, is whether there's a need for the in-memory technology to support data persistence in order to prevent information from being lost if a system crashes or an analytics process is interrupted. He added that organizations should also be aware of the data-size limits of different in-memory analytics tools when evaluating the available software options.
In-memory system scalability is a long-term concern for online humor network Cheezburger Inc., which has paired QlikTech's QlikView software with a Hadoop-based big data environment to glean real-time insights into the online activities of website visitors. That enables it to more effectively tailor content for individual users, said Loren Bast, who was director of BI at Cheezburger until leaving the Seattle-based company in April. But as the BI team looks to expand the in-memory big data analytics system to a growing number of Cheezburger workers, Bast acknowledged that there are worries about the cost of in-memory processing.
"While [memory] is not expensive anymore, it's still not free," he said. "We have a few high-memory servers running this, but they're hitting their limitations in terms of how much data we can load in. Also, now we have dozens of users using the system and have no issue keeping the hardware and software costs in line with value. But what happens when we extend out the reporting to more users?" Bast added that both system infrastructure and software licensing costs can "become quite high real quickly when supporting thousands of [business users]."
Stay flexible on in-memory, big data tools
Flexibility and data integration are other issues that should be on the radar screen of BI and analytics managers looking to support analysis of big data in memory. In-memory analytics and big data technologies alike are still relatively new and continue to evolve. For example, a system architecture might eventually need to encompass more than Hadoop data stores, even though that's the technology most closely associated with big data.
"It's important that people invest in tools and architectures that are very flexible," Boehm said. "Requirements will change; user needs will change. And if you lock yourself into one architecture that requires only one mode of working, you're locking out options that may be required as other opportunities come up down the road."
More from this series on in-memory analytics and big data
Learn about potential big data analytics uses of in-memory tools … and challenges
Read a Q&A with author Michael Minelli on matching in-memory speeds and business needs
ContactLab, an email marketing services provider in Milan, Italy, has already recognized that fact. The company, which has offices in five European countries, is loading email campaign and related Web activity data from a Hadoop system into SAS Institute Inc.'s Visual Analytics software for in-memory analysis. But long term, it expects to add forms of relational data to the in-memory analytics mix, said Massimo Fubini, ContactLab's founder and managing director. For example, he cited transactional data about the use of customer loyalty cards.
"I'm not thinking that Hadoop is going to be the only solution in the future," Fubini said. "Our relational data is still really important, and I want the ability to mix together the two environments. The future is all about doing data analytics -- it's not about the software. So the real challenge is to [create an environment] where you can change your mind."
About the author:
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for more than 25 years.