The analytics process, including the deployment and use of tools for big data analytics, can help companies improve operational efficiency, drive new revenue and gain competitive advantages over business rivals. But there are different types of analytics applications to consider.
For example, descriptive analytics focuses on describing something that has already happened, as well as suggesting its root causes. Descriptive analytics, which represents the lion's share of the analysis performed, typically hinges on basic querying, reporting and visualization of historical data.
Alternatively, more complex predictive and prescriptive modeling can help companies anticipate business opportunities and make decisions that affect profits in areas such as targeting marketing campaigns, reducing customer churn and avoiding equipment failures. With predictive analytics, historical data sets are mined for patterns indicative of future situations and behaviors, while prescriptive analytics subsumes the results of predictive analytics to suggest actions that will best take advantage of the predicted scenarios.
In many environments, the processing and data storage demands of advanced analytics applications have limited their adoption -- but those barriers are beginning to fall. The growing availability of big data platforms and tools for big data analytics has enabled environments in which predictive and prescriptive analytics applications can scale to handle massive data volumes originating from a wide variety of sources.
What does big data analytics mean?
In essence, big data analytics software products support predictive and prescriptive analytics applications running on big data computing platforms -- typically parallel processing systems based on clusters of commodity servers, scalable distributed storage, and technologies such as Apache Hadoop and NoSQL databases. The tools for big data analytics are designed to enable users to rapidly analyze large amounts of data, often within a real-time window.
In addition, big data analytics software provides the framework for using data mining techniques to analyze data, discover patterns, propose analytics models to recognize and react to identified patterns, and then enhance the performance of business processes by embedding the analytical models within the corresponding operational applications. For example, massive amounts of shipping delivery data, streaming traffic and weather data, and historical vendor performance data can be analyzed to devise a model for optimal selection of shipping subcontractors within geographic regions to limit the risks of late delivery or damaged goods.
Tools for big data analytics can ingest a wide variety of data types: structured data with defined and consistent fields, such as transaction data stored in relational databases; semi-structured data, such as web server or mobile application log files; and unstructured data, encompassing things like text files, documents, emails, text messages and social media posts.
Powering analytics: Inside big data and advanced analytics tools
A Google search for big data analytics yields a long list of vendors. However, many of these vendors provide big data platforms and tools that support the analytics process -- for example, data integration, data preparation and other types of data management software. We focus on tools that meet the following criteria:
- They provide the analyst with advanced analytics algorithms and models.
- They're engineered to run on big data platforms, such as Hadoop, or specialty high-performance analytics systems.
- They're easily adaptable to use structured and unstructured data from multiple sources.
- Their performance is capable of scaling as more data is incorporated into analytical models.
- Their analytical models can be or already are integrated with data visualization and presentation tools.
- They can easily be integrated with other technologies.
In addition, the tools must incorporate essential characteristics and include integrated algorithms and methods supporting the typical suite of data mining techniques, including, but not limited to:
Clustering and segmentation, which divides a large collection of entities into smaller groups that exhibit some potentially unanticipated similarities. An example is analyzing a collection of customers to differentiate smaller segments for targeted marketing.
Classification, which is a process of organizing data into predefined classes based on attributes that are either preselected by an analyst or identified as a result of a clustering model. An example is using the segmentation model to determine into which segment a new customer should be categorized.
Regression, which is used to discover relationships between a dependent variable and one or more independent variables, and helps determine how the dependent variable's values change in relation to the independent variable values. An example is using geographic location, mean income, average summer temperature and square footage to predict the future value of a property.
Association and item set mining, which looks for statistically relevant relationships among variables in a large data set. For example, this could help direct call-center representatives to offer specific incentives based on the caller's customer segment, duration of relationship and type of complaint.
Similarity and correlation, which is used to inform undirected clustering algorithms. Similarity scoring algorithms can be used to determine the similarity of entities placed in a candidate cluster.
Neural networks, which are used in undirected analysis for machine learning based on adaptive weighting and approximation.
This is just a subset of the types of analyses used for predictive and prescriptive analytics. In addition, different vendors are likely to provide a variety of algorithms supporting each of the different methods.
The advanced analytics tools market
The market for advanced analytics tools has evolved over time, and the types of tools that are available vary in degree of maturity and, consequently, in capability and ease of use. For example, there are tools with relatively long histories from some mega-vendors, like IBM, Oracle and SAS. Other large vendors have acquired companies whose tools have a more recent history, such as those provided by Microsoft, Dell EMC, Teradata and SAP.
A number of smaller companies provide big data analytics products, including Angoss, Predixion Software, Alteryx, Alpine Data Labs, Pentaho, Knime and RapidMiner. In some cases, companies have developed their own suite of algorithms. Others have adopted the open source statistical R language and provide predictive and prescriptive modeling capabilities using R's features, or they use the software from the open source Weka project.
The third category of products is those available as open source technologies. Examples include the previously mentioned R language, the Mahout software distribution that's part of the Hadoop stack and Weka.
In some of these cases -- particularly with the mega-vendors -- the tools for big data analytics are incorporated into larger big data enterprise suites. In others, the tools are sold as stand-alone products. In the latter case, it's the customer's job to integrate with the big data platform being deployed.
Most of the tools provide a visual interface to guide the analytics processes -- data mining and discovery analysis, evaluation and scoring of models, integration with operational environments -- and, in most cases, the vendors provide guidance and services to get the customer up and running.
Who uses big data and advanced analytics tools?
While some individuals in the organization will look to explore and devise new predictive models, others will look to embed these models within their business processes, and still others will want to understand the overall impact that these tools will have on the business. In other words, organizations that are adopting big data analytics need to accommodate a variety of user types, such as:
- The data scientist, who likely performs more complex analyses involving more complex data types, and who is familiar with how the underlying models are designed and implemented to assess inherent dependencies or biases.
- The business analyst, who is likely a more casual user looking to use the tools for proactive data discovery or visualization of existing information, as well as some predictive analytics.
- The business manager, who is looking to understand the models and conclusions.
- IT developers, who support all the prior categories of users.
All of these roles would typically work together in the model development lifecycle. The data scientist subjects a swath of big data sets to the undirected analyses provided and looks for any patterns that would be of business interest. After engaging the business analyst to review how the models work and evaluate how each of those discovered models or patterns could potentially positively affect the business, the business manager and IT teams should be brought in to embed or integrate the models into business processes or to devise new processes around the models.
From a market perspective, though, it's interesting to consider the types of businesses that are embracing big data analytics. Many of the early users of big data technologies were companies such as Google, Yahoo, Facebook, LinkedIn and Netflix, or analytics services providers. Each of these companies relied on operational and analytical applications, requiring fast-flowing streams of data to ingest, process, analyze, and then feed the results back to continuously improve performance.
As appetites for data expand among companies in more mainstream industries, big data analytics has found a place in a more general corporate population. In the past, the cost factors for a large-scale analytics platform would have limited the adoption to only the very largest businesses.
However, the availability of utility-style hosted big data platforms, such as those from Amazon Web Services, and the ability to instantiate big data platforms such as Hadoop on premises without a large investment, have reduced the barrier to entry. In addition, open data sets and accessibility to firehose data feeds from social media channels provide the raw materials for larger scale data analyses when blended with internal data sets.
Larger businesses may still opt for high-end tools for big data analytics, but lower cost alternatives deployed on cost-effective platforms can enable SMBs to evaluate and launch big data analytics programs and achieve the desired business improvement results.
Now that we've examined the different types of tools and their uses, the next step is to determine how these big data analytics tools could benefit your company. By taking a look at the various use cases for big data analytics, you will begin to see where a general big data analytics capability can be leveraged for creating and enhancing value.