big data analytics

This definition is part of our Essential Guide: Using big data platforms for data management, access and analytics
Contributor(s): Lisa Martinek and Craig Stedman

Big data analytics is the process of examining large and varied data sets -- i.e., big data -- to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions.

Big data analytics benefits

Driven by specialized analytics systems and software, big data analytics can point the way to various business benefits, including new revenue opportunities, more effective marketing, better customer service, improved operational efficiency and competitive advantages over rivals.

Big data analytics applications enable data scientists, predictive modelers, statisticians and other analytics professionals to analyze growing volumes of structured transaction data, plus other forms of data that are often left untapped by conventional business intelligence (BI) and analytics programs. That encompasses a mix of semi-structured and unstructured data -- for example, internet clickstream data, web server logs, social media content, text from customer emails and survey responses, mobile-phone call-detail records and machine data captured by sensors connected to the internet of things

On a broad scale, data analytics technologies and techniques provide a means of analyzing data sets and drawing conclusions about them to help organizations make informed business decisions. BI queries answer basic questions about business operations and performance. Big data analytics is a form of advanced analytics, which involves complex applications with elements such as predictive models, statistical algorithms and what-if analyses powered by high-performance analytics systems.

Big data's roadblocks

Emergence and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-1990s. In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc., expanded the notion of big data to also include increases in the variety of data being generated by organizations and the velocity at which that data was being created and updated. Those three factors -- volume, velocity and variety -- became known as the 3Vs of big data, a concept Gartner popularized after acquiring Meta Group and hiring Laney in 2005.

Separately, the Hadoop distributed processing framework was launched as an Apache open source project in 2006, planting the seeds for a clustered platform built on top of commodity hardware and geared to run big data applications. By 2011, big data analytics began to take a firm hold in organizations and the public eye, along with Hadoop and various related big data technologies that had sprung up around it.

Initially, as the Hadoop ecosystem took shape and started to mature, big data applications were primarily the province of large internet and e-commerce companies, such as Yahoo, Google and Facebook, as well as analytics and marketing services providers. In ensuing years, though, big data analytics has increasingly been embraced by retailers, financial services firms, insurers, healthcare organizations, manufacturers, energy companies and other mainstream enterprises.

Big data analytics technologies and tools

Unstructured and semi-structured data types typically don't fit well in traditional data warehouses that are based on relational databases oriented to structured data sets. Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently -- or even continually, as in the case of real-time data on stock trading, the online activities of website visitors or the performance of mobile applications.

As a result, many organizations that collect, process and analyze big data turn to NoSQL databases as well as Hadoop and its companion tools, including:

  • YARNa cluster management technology and one of the key features in second-generation Hadoop.
  • MapReducea software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
  • Sparkan open-source parallel processing framework that enables users to run large-scale data analytics applications across clustered systems.
  • HBasea column-oriented key/value data store built to run on top of the Hadoop Distributed File System (HDFS).
  • Hivean open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files.
  • Kafkaa distributed publish-subscribe messaging system designed to replace traditional message brokers.
  • Pigan open-source technology that offers a high-level mechanism for the parallel programming of MapReduce jobs to be executed on Hadoop clusters.

In some cases, Hadoop clusters and NoSQL systems are being used primarily as landing pads and staging areas for data before it gets loaded into a data warehouse or analytical database for analysis, usually in a summarized form that is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the Hadoop Distributed File System must be organized, configured and partitioned properly to get good performance on both extract, transform and load (ETL) integration jobs and analytical queries. 

Once the data is ready, it can be analyzed with the software commonly used in advanced analytics processes. That includes tools for data mining, which sift through data sets in search of patterns and relationships; predictive analytics, which build models for forecasting customer behavior and other future developments; machine learning, which tap algorithms to analyze large data sets; and deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data analytics process, as can mainstream BI software and data visualization tools. For both ETL and analytics applications, queries can be written in batch-mode MapReduce; programming languages, such as R, Python and Scala; and SQL, the standard language for relational databases that's supported via SQL-on-Hadoop technologies.

Big data analytics uses and challenges

Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information services providers. In addition, streaming analytics applications are becoming common in big data environments, as users look to do real-time analytics on data fed into Hadoop systems through Spark's Spark Streaming module or other open source stream processing engines, such as Flink and Storm.

Early big data systems were mostly deployed on-premises, particularly in large organizations that were collecting, organizing and analyzing massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud, as have Hadoop suppliers such as Cloudera and Hortonworks, which support their distributions of the big data framework on the AWS and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as needed and then take them offline, with usage-based pricing that doesn't require ongoing software licenses.

Potential pitfalls that can trip up organizations on big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced data scientists and data engineers to fill the gaps.

The amount of data that's typically involved, and its variety, can cause data management issues in areas including data quality, consistency and governance; also, data silos can result from the use of different platforms and data stores in a big data architecture. In addition, integrating Hadoop, Spark and other big data tools into a cohesive architecture that meets an organization's big data analytics needs is a challenging proposition for many IT and analytics teams, which have to identify the right mix of technologies and then put the pieces together.

This was last updated in March 2017

Next Steps

Mainstream users make getting a payback from big data analytics a priority

Consultant David Loshin details how big data analytics tools can help companies

Building a solid big data architecture for analytics can be a daunting task

Continue Reading About big data analytics



Find more PRO+ content and other member only offers, here.

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

How is "big data" different from "data mining"?
Big Data reffers to the full set of information and data mining gathers the techniques you use in order to analyze data in general: big data, small data..
would like to know role of intelligent software agents in big data analytics
Can anyone start his or her career in data analytics? Whta basics it need?
At a very high level, Data mining is looking for data based on specifc requests from the client. Big data is analyzing patterns to understand business and create new analytics.
Thanks. Great piece. Although the competition has changed during past two years and as mentioned, Hadoop and especially map reduce platforms got much more attention and importance. Due to variety of data sourced and amount of data, players such as tableau, splunk, and cloudera getting more and more attention.
How could big data help segmenting different customer group needs
What is the difference between using a traditional Data Warehouse and a solution on top of it (Like Cloudera) or using Hadoop for big data analytics (Somethink like Hunk (Splunk) or Datameer ( ))? Which one is better specifically for a medium size company?
Having understood what Big Data is all about, can someone please give a list of all the popular Big data software innovators. I have a small list with me which includes Companies like Amazon , IBM etc. What I need is something which is affordable for my company. I've heard of a company called Qburst Technologies which affords to give its customers satisfaction coupled with low pricing.
Big data analytics is becoming a trending topic. Once of the biggest benefits is whenever you take the technology and use if for the healthcare industry. Companies like Due North Analytics are able to take the patients data to determine how affective treatment is, prescriptions, and future cost. All of which help the healthcare industry become more efficient. Learn more about there company and predictive analytics.
What kind of big data analytics challenges does your organization face? And what are you doing to overcome them?
They are many issues an organization face if the implement big data 
Mainly performance issues if system architecture allows optimization then issues can be resoled.

Other issue is with data accuracy and validation?

Having gone through several writings on Big data analytics , I am convinced that there are several areas in which it's application in certain areas of our operation could increase our market share and ultimately enhance our bottomline as a bank playing in retail sector 
Big data is the most important aspect which all have to be aware of in the field of buisness..
If one want to be in some of the best management companies one must know about all these aspects..
To start your career it is a good idea to get familiar with the latest tools after you have a basic understanding. ~ Christopher Gruden, Cleveland, OH

Some tools I suggest:


~ Christopher Gruden, Cleveland, OH

And one more: Talend Open Studio

~ Christopher Gruden, Cleveland, OH

And Skytree Server. ~ Christopher Gruden, Cleveland, OH
How is "big data" different from "data mining"?


File Extensions and File Formats

Powered by: