As if real-time business intelligence and analytics projects aren't challenging enough, throw the element of "big data" into the mix and you inject yet another dimension of complexity for BI professionals and analytics software vendors to grapple with.
On its own, big data -- the large and varied pools of structured transaction data and unstructured information coming from sources such as social networks, Web server logs and sensors -- has many BI and data warehousing teams scrambling to get a handle on a new category of technologies and still-developing deployment and management methodologies. Trying to incorporate big data into real-time analysis systems could make them even more frazzled.
"Real-time business intelligence for big data is definitely more complicated," cautioned Lyndsay Wise, president and founder of Toronto-based consultancy WiseAnalytics. "You're analyzing more types of data, and with that comes a more complex architecture."
One of the biggest challenges around applying real-time BI and analytics processes to big data is that the initial round of big data management technologies are inherently not well-suited for real-time operations, according to Wise and other analysts. Take Hadoop, for example. One of the most talked-about and widely used big data tools, Hadoop is an open source framework born from code developed by Internet giants Google and Yahoo to manage their massive volumes of data in a distributed fashion on large clusters of commodity hardware. But Hadoop's principal technologies, MapReduce and the Hadoop Distributed File System, are both oriented to batch processing.
More in this series on real-time business analytics
Learn how real-time access to BI data helps inform decision making in business operations
Read about the importance of mixing IT and business savvy for real-time data analysis success
A dump truck, not a real-time dragster
"Hadoop is not the way you want to do real-time analytics -- it doesn't respond fast enough," said John Myers, a BI and data warehousing analyst at Enterprise Management Associates Inc. in Boulder, Colo. "Hadoop is a batch platform, more like a diesel dump truck -- it's good at taking stuff from point A to point B. There are other technologies that have a much faster response rate to get into data in near real time." For example, Myers cited data warehouse and in-memory computing appliances as more viable options for real-time data processing.
In addition, Hadoop is still a relatively new technology; as a result, there is a limited pool of IT professionals with Hadoop skills and experience. To build a real-time data analysis system based on Hadoop, an organization would need programmers and data architects versed in the nuances of both real-time BI and big data systems, plus data scientists or other analytics pros to design and build the analytical models.
While traditional BI and data warehousing platforms mask much of the technical complexity of doing that work from user organizations, that's not yet the case with analytics applications in the big data and Hadoop world, Myers said. For the most part, he added, companies have to stitch together their own systems for analyzing big data. That might include integrating data mining and predictive analytics software with components such as middleware, modeling tools and a rules engine.
Because of all those factors, organizations looking to combine real-time data analytics and big data should be prepared to pay for the privilege, Myers warned: "Hiring rock stars for these particular technologies can be a very expensive investment."
Need for DIY approach diminishing
The build-it-yourself scenario is starting to change. In 2010, Google published a paper describing an ad hoc querying system called Dremel that it had developed for internal use against data stored in Hadoop. Now software vendors spurred into action by Google's paper are rolling out similar technologies.
For example, Cloudera Inc., a vendor in Palo Alto, Calif., that offers a distribution of Hadoop and sells support services to customers, released a public beta version of a real-time querying tool for Hadoop in October 2012. Cloudera's tool, called Impala, is scheduled for general availability in the first quarter of 2013, also as an open source offering. Chicago-based startup HStreaming LLC is also marketing tools for doing real-time analysis of data in Hadoop systems.
But just because social media data and other forms of unstructured information collected in Hadoop will be able to be analyzed in real time doesn't mean that doing so is crucial to the success of an analytics process, said Cindi Howson, founder of BI Scorecard, a research and consulting company in Sparta, N.J., that publishes technical evaluations of BI and analytics tools.
While Hadoop excels as a platform for storing and exploring large amounts of data, Howson and other analysts say some of the data collected in Hadoop systems could fit into a traditional BI and data warehousing environment as well. Companies might be better off moving that data out of Hadoop for analysis in order to take advantage of the more mature BI tools at their disposal -- something that could still be done in near real time. In such cases, said Colin White, president of consultancy BI Research in Ashland, Ore., Hadoop could become "just another source of data feeding the data warehouse."
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for more than 25 years for a variety of publications and websites.
Follow SearchBusinessAnalytics.com on Twitter: @BizAnalytics_TT
This was first published in January 2013