A buyer's guide to selecting the right big data analytics software
A collection of articles that takes you from defining technology needs to purchasing options
The growing availability of big data platforms and big data analytics tools is helping organizations gather and analyze data in search of valuable business information and insights that can help them improve their products and services.
Big data analytics tools are used to analyze collections of data and for developing predictive and prescriptive models. Embedding these models within modernized business process applications can improve business productivity and value. At the same time, big data analytics tools are designed to easily scale to employ the resources that are typically available in a big data platform.
In general, the techniques that big data analytics tools frequently provide aren't new; however, it's only recently that the power of data mining algorithms has been adapted to empower the mainstream business user with predictive and prescriptive analyses of data sets featuring a combination of large volumes, a variety of data types and variable structures.
From a user perspective, big data analytics is still an emerging enterprise capability, with inherent risks and requisite time investments to achieve the expected benefits. How can you determine whether big data analytics is appropriate for your organization? Here, we examine how the business use cases dovetail with fundamental data drivers such as volume, variety, etc., to highlight how using big data analytics tools can yield measurable value.
Types of big data to analyze
There's one clear notion that differentiates big data analytics from other forms of analytics: the volume, scale and diversity of the data to be analyzed. In the past, analytical models often were built and then trained through a test-and-refinement process using sample data sets pulled from very large databases. However, today, with computing platforms providing scalable storage and computational ability, there are few limitations on the volume of data that can be analyzed. This suggests scenarios in which real-time predictive analytics and access to large volumes of the right data can lead to improved business performance. The opportunities lie with the ability to blend and analyze the different kinds of big data, such as:
Transaction data. A big data platform provides the ability to capture greater volumes of structured transaction data spanning much greater periods of time. This enables the analysis of a broader array of transaction types, extending beyond point-of-sale or e-commerce purchases to include behavior transactions, such as those captured through logging of Internet clickstream data by Web servers.
Human-generated data. Unstructured data contained in emails, documents, images, audio files, video files, and data streamed through blogs, wikis and -- especially -- social media channels provides fertile ground for using text analytics capabilities to enhance analyses.
Mobile data. As Internet-connected smartphones and tablets become ubiquitous, the apps deployed on these devices are capable of tracking and communicating numerous events. These range from in-app transactions (including recorded events such as a product search) to demographic or status reporting events (such as reporting a new geocode when there's a change in location).
Machine and sensor data. This includes data that's created or generated by functional devices such as smart utility meters, intelligent thermostats, factory machinery and network-connected home appliances. These devices can be configured to automatically communicate with other nodes in a connected mesh as well as a centralized server that can support analysis of the data. Machine and sensor data are prime examples of data emanating from the emerging Internet of Things (IoT). The data streaming from the IoT can be used to build analytical models that do continuous monitoring for predictive behavior (such as identifying when sensor values indicate a problem) and offer prescriptive directives (such as alerting a technician to service a device before it actually fails).
Business use cases
With the different types of data in mind, big data analytics tools can be a reasonable investment if your organization is considering any of these applications:
Customer analytics. This includes analyzing customer demographics, behaviors and characteristics to develop models for segmenting customers, predicting churn and making next-best-offer recommendations to help with customer retention.
Sales and marketing analytics. There are two types of marketing use cases. The first involves using analytical models to improve how customer-facing applications make direct recommendations to the customer. Examples include better identifying opportunities for cross-selling and upselling, decreasing abandoned shopping carts and generally improving the accuracy of integrated recommendation engines. The second type is more reflexive in that it's intended to show the performance of the marketing group's processes and campaigns and recommend adjustments to optimize that performance. Examples include analyzing which campaign addressed the needs of identified clusters or segments, or the success ratios for motivating the campaigns' call to action.
Social media analytics. The content that streams across social media channels provides ample opportunities for analyzing customer sentiment and identifying brand risks when negative information is promulgated about a company's products.
Cybersecurity. Massive cybersecurity attacks on companies such as Target, Sony and Anthem highlight a growing need for businesses to rapidly recognize when cyberattacks occur. Identifying potential attacks involves building analytical models for monitoring massive amounts of data reflecting network activities and corresponding access behaviors to identify suspicious patterns that might indicate a breach.
Plant and facility management. As more devices and machines are Internet-enabled, organizations are able to collect and analyze streaming sensor data indicating continuous measures of power usage, temperature, humidity and contaminant particles, among a myriad of potential variables. Models can be developed for predicting equipment failures and scheduling pre-emptive maintenance to keep items in working order without interruption.
Pipeline management. Energy pipelines are increasingly fitted with sensors and communications capabilities. Continuous streams of sensor data can be analyzed for local and global issues that indicate a need for observation or maintenance.
Supply chain and channel analytics. Analyzing warehouse inventory, point-of-sale transactions and shipments via a variety of channels (e.g., trucking, rail, shipping) results in predictive analytical models that can help with pre-emptive replenishment, inventory management strategies, logistics management, route optimization and notifications when delays imperil timely deliveries.
Price optimization. Retailers looking to maximize overall profitability for product sales may develop analytical models that combine a variety of data streams, including competitors' prices, sales transactions across many geographic regions (to review demand), and information on production, inventories and the supply chain (to monitor supply). The resulting models can be used to dynamically adjust product prices up when supplies are low, demand is on the increase and competitors are unable to deliver, or down when inventory needs to be cleared as seasonal demand shifts.
Fraud detection. An adjunct to the growing risk of identity theft is the growth of fraudulent activity and transactions. Financial institutions analyze billions of transactions to identify patterns of fraudulent behavior, and the analytical models can also trigger alerts to customers when a potential fraudulent transaction might be taking place.
All of these share similar characteristics in that the analysis involves both structured and unstructured data, the data is being accessed or streamed from a variety of sources, and the volumes are potentially massive. In turn, the results of the analyses yield analytical models that can be used to identify patterns from the same sources and streams of data in real time.
Taking advantage of the lowered barrier to entry
Although the capabilities provided by big data analytics tools aren't entirely new, three key factors have helped to lower the barrier to entry and enabled a broader spectrum of organizations to consider adopting this type of technology:
- Cost. The availability of tool suites with pricing models that make adoption economically feasible contrasts with early vendor products that had high price tags and offered expensive after-sale services to integrate and deploy.
- Simplicity. These tools are increasingly designed for and targeted to a non-expert community of users. Early vendor products were embraced by statisticians and mathematicians who not only built models, but also understood the details of how they worked. Today's products don't require the user to have an advanced scientific degree to gain business advantage from the resulting models.
- Performance. Scalable platforms can accommodate the data volumes and computational needs for big data analytics. Today, there are open source platforms enabling massively parallel processing over distributed storage frameworks deployed on commodity hardware whose price/performance ratios are orders of magnitude lower than what has been available in the past.
Integrating big data analytics into the enterprise
The barriers to adopting big data analytics software have fallen, enabling forward-thinking organizations to rapidly try out and integrate these tools within the enterprise. These organizations share some common characteristics in that they have:
- A culture of being data/analytics-driven where business sponsors recognize the potential of information in creating corporate value;
- An environment in which the key stakeholders are aware there's a wide variety of data sources -- some static, and many others dynamically streamed -- that may contribute to analytics processes;
- An environment conducive to proofs-of-concept along with agility in making decisions for adopting technology.
In other words, if your company exhibits some or all of these characteristics, it may be poised to take advantage of integrating big data analytics as part of the enterprise technology landscape.
Once you've determined that your site could benefit from big data analytics tools, the next step is to establish your company's specific needs and prioritize the specified criteria that will be used to evaluate selected vendor products. You can then map those needs to the types of characteristics provided by big data analytics tools, and use those as evaluation factors for a request for information (RFI) or request for proposal (RFP) to be shared with the product vendors. The responses to the RFI/RFP will refine the selection of vendors and influence your choice of big data analytics tool.
Test your knowledge of big data tools and services in the cloud