As the big data market matures, the focus shifts from the new data itself to its use in concert with traditional operational business data.
For the business analyst, big data can be very seductive. It exists in enormous quantities. It contains an extensive and expanding record of every interaction that makes up people’s various daily behaviors. According to all the experts, the previously unnoticed correlations it contains hold the potential for discovering customer preferences, understanding their next actions, and even creating brand new business models. Trailblazing businesses in every industry, especially Internet startups, are already doing this, largely based in Hadoop. The future beckons…
However, a key data source—traditional business transactions and other operational and informational data—has been largely isolated from this big data scene. And although the Hadoop store is the default destination for all big data, this older but most important data of the actual business—the customer and product records, the transactions, and so on—usually reside elsewhere entirely, in the relational databases of the business’ operational and informational systems. This data is key to many of the most useful analyses the business user may desire. The graphic above depicts how a modern customer journey accesses and creates data in a wide variety of places and formats, suggesting the range of sources required for comprehensive analytics and the importance of the final purchasing stage.
There are a number of approaches to bringing these disparate data sources together. For some businesses, copying a subset of big data to traditional platforms is a preferred tactic. Others, particularly large enterprises, prefer a data virtualization approach as described in the IDEAL architecture of Business unIntelligence. For businesses based mostly or largely in the cloud, bringing operational data into the Hadoop environment often makes sense, given that the majority of their data resides here or in other cloud platforms. The challenge that arises, however, is how to make analytics of this combined data most usable. Technical complexity and a lack of contextual information in Hadoop can be serious barriers to adoption of big data analytics on this platform by ordinary business analysts.
To overcome these issues, four areas of improvement in today’s big data analytics are needed:
1. Combine data from traditional and new sources
2. Create context for data while maintaining agile structure
3. Support iterative, speed-of-thought analytics
4. Enable business-user-friendly analytical interface
Big data is commonly loaded directly into Hadoop in any of a range of common formats, such as CSV, JSON, web logs and more. Operational and informational data, however, must first be extracted from its normal relational database environments before loading it in a flat-file format. Careful analysis and modeling is needed to ensure that such extracts faithfully represent the actual state of the business. Such skills are often to be found in the ETL (extract-transform-load) teams responsible for traditional business intelligence systems, and should be applied here too.
To process such data, users need to be able to define the meaning of the data before exploring and playing with it, in order to address improvement #2 above. Given analysts’ familiarity with tabular data formats, such as spreadsheets and relational tables, a simple modeling and enhancement tool that overlays such a structure on the data is a useful approach. This separates the user from the technical underlying programming methods.
At the level of the physical data access and processing required to return results to the users, one approach is to translate the users’ queries into MapReduce batch programs to run directly against the Hadoop file store. Another approach adds a columnar, compressed, in-memory appliance. This provides iterative, speed-of-thought analytics, in line with improvement #3, by offering an analytic data mart sourced from Hadoop. In this environment, the analyst interacts iteratively with visual dashboards. This is analogous to BI tools, operating on top of a relational database. This top layer provides for the fourth required improvement: a business-user-friendly analytical interface.
The four improvement areas listed here are at the heart of Platfora’s approach to delivering big data analytics. For a more detailed explanation, as well as descriptions of a number of customer implementations, please see my white paper, “Demystifying big data analytics” or the accompanying webinar on this topic.