big data analytics

Contributor(s): Mark Labbe, Lisa Martinek and Craig Stedman

Big data analytics is the often complex process of examining big data to uncover information -- such as hidden patterns, correlations, market trends and customer preferences -- that can help organizations make informed business decisions.

On a broad scale, data analytics technologies and techniques provide a means to analyze data sets and take away new information—which can help organizations make informed business decisions. Business intelligence (BI) queries answer basic questions about business operations and performance.

Big data analytics is a form of advanced analytics, which involve complex applications with elements such as predictive models, statistical algorithms and what-if analysis powered by analytics systems.

The importance of big data analytics

Big data analytics through specialized systems and software can lead to positive business-related outcomes:

  • New revenue opportunities
  • More effective marketing
  • Better customer service
  • Improved operational efficiency
  • Competitive advantages over rivals

Big data analytics applications allow data analysts, data scientists, predictive modelers, statisticians and other analytics professionals to analyze growing volumes of structured transaction data, plus other forms of data that are often left untapped by conventional BI and analytics programs. This includes a mix of semi-structured and unstructured data. For example, internet clickstream data, web server logs, social media content, text from customer emails and survey responses, mobile phone records, and machine data captured by sensors connected to the internet of things (IoT). 

Big data analytics is a form of advanced analytics.
Big data analytics is a form of advanced analytics, which has marked differences compared to traditional BI.

How big data analytics works

In some cases, Hadoop clusters and NoSQL systems are used primarily as landing pads and staging areas for data. This is before it gets loaded into a data warehouse or analytical database for analysis -- usually in a summarized form that is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the HDFS must be organized, configured and partitioned properly to get good performance out of both extract, transform and load (ETL) integration jobs and analytical queries. 

Once the data is ready, it can be analyzed with the software commonly used for advanced analytics processes. That includes tools for:

  • data mining, which sift through data sets in search of patterns and relationships; 
  • predictive analytics, which build models to forecast customer behavior and other future developments; 
  • machine learning, which taps algorithms to analyze large data sets; and 
  • deep learning, a more advanced offshoot of machine learning.

Text mining and statistical analysis software can also play a role in the big data analytics process, as can mainstream business intelligence software and data visualization tools. For both ETL and analytics applications, queries can be written in MapReduce, with programming languages such as RPythonScala, and SQL. These are the standard languages for relational databases that are supported via SQL-on-Hadoop technologies.

Big data analytics uses and challenges

See the four types of big data analytics and what
each is used for.

Big data analytics applications often include data from both internal systems and external sources, such as weather data or demographic data on consumers compiled by third-party information services providers. In addition, streaming analytics applications are becoming common in big data environments as users look to perform real-time analytics on data fed into Hadoop systems through stream processing engines, such as Spark, Flink and Storm.

Early big data systems were mostly deployed on premises, particularly in large organizations that collected, organized and analyzed massive amounts of data. But cloud platform vendors, such as Amazon Web Services (AWS) and Microsoft, have made it easier to set up and manage Hadoop clusters in the cloud. The same goes for Hadoop suppliers such as Cloudera-Hortonworks, which supports the distribution of the big data framework on the AWS and Microsoft Azure clouds. Users can now spin up clusters in the cloud, run them for as long as they need and then take them offline with usage-based pricing that doesn't require ongoing software licenses.

Big data has become increasingly beneficial in supply chain analytics. Big supply chain analytics utilizes big data and quantitative methods to enhance decision making processes across the supply chain. Specifically, big supply chain analytics expands datasets for increased analysis that goes beyond the traditional internal data found on enterprise resource planning (ERP) and supply chain management (SCM) systems. Also, big supply chain analytics implements highly effective statistical methods on new and existing data sources. The insights gathered facilitate better informed and more effective decisions that benefit and improve the supply chain.

Potential pitfalls of big data analytics initiatives include a lack of internal analytics skills and the high cost of hiring experienced data scientists and data engineers to fill the gaps.

Structured and unstructured data can be analyzed using big data analytics.
Big data analytics involves analyzing structured and unstructured data.

Emergence and growth of big data analytics

The term big data was first used to refer to increasing data volumes in the mid-1990s. In 2001, Doug Laney, then an analyst at consultancy Meta Group Inc., expanded the notion of big data.  This encompassed increases in the variety of data being generated by organizations and the velocity at which that data was being created and updated. Those three factors -- volume, velocity and variety -- became known as the 3Vs of big data, a concept Gartner popularized after acquiring Meta Group and hiring Laney in 2005.

Separately, the Hadoop distributed processing framework was launched as an Apache open source project in 2006. This planted the seeds for a clustered platform built on top of commodity hardware and geared to run big data applications. By 2011, big data analytics began to take a firm hold in organizations and the public eye, along with Hadoop and various related big data technologies that had sprung up around it.

Initially, as the Hadoop ecosystem took shape and started to mature, big data applications were primarily the province of large internet and e-commerce companies such as Yahoo, Google and Facebook, as well as analytics and marketing services providers. In the ensuing years, though, big data analytics has increasingly been embraced by retailers, financial services firms, insurers, healthcare organizations, manufacturers, energy companies and other enterprises.

This was last updated in July 2020

Continue Reading About big data analytics

Dig Deeper on Big data analytics

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

How is "big data" different from "data mining"?
Big Data reffers to the full set of information and data mining gathers the techniques you use in order to analyze data in general: big data, small data..
Both are different as night and day.

Data mining is when you gather data either by the help of bot, crawler or other methods. 
Big Data on the other hand is when you try to make sense of the gathered data or try to get something meaningful or useful out of it.
would like to know role of intelligent software agents in big data analytics
Can anyone start his or her career in data analytics? Whta basics it need?
Yes and No.. It all depends on your experience and knowledge in the field. Below is a good article to get a high-level idea on career opportunities in big Data and what each of it takes to enter.
At a very high level, Data mining is looking for data based on specifc requests from the client. Big data is analyzing patterns to understand business and create new analytics.
Thanks. Great piece. Although the competition has changed during past two years and as mentioned, Hadoop and especially map reduce platforms got much more attention and importance. Due to variety of data sourced and amount of data, players such as tableau, splunk, and cloudera getting more and more attention.
How could big data help segmenting different customer group needs
Having understood what Big Data is all about, can someone please give a list of all the popular Big data software innovators. I have a small list with me which includes Companies like Amazon, IBM etc. What I need is something which is affordable for my company. I've heard of a company called Qburst Technologies which affords to give its customers satisfaction coupled with low pricing.
What kind of big data analytics challenges does your organization face? And what are you doing to overcome them?
They are many issues an organization face if the implement big data 
Mainly performance issues if system architecture allows optimization then issues can be resoled.

Other issue is with data accuracy and validation?

Having gone through several writings on Big data analytics , I am convinced that there are several areas in which it's application in certain areas of our operation could increase our market share and ultimately enhance our bottomline as a bank playing in retail sector 
Big data is the most important aspect which all have to be aware of in the field of buisness..
If one want to be in some of the best management companies one must know about all these aspects..
To start your career it is a good idea to get familiar with the latest tools after you have a basic understanding. ~ Christopher Gruden, Cleveland, OH

Some tools I suggest:


~ Christopher Gruden, Cleveland, OH

And one more: Talend Open Studio

~ Christopher Gruden, Cleveland, OH

And Skytree Server. ~ Christopher Gruden, Cleveland, OH
How is "big data" different from "data mining"?
Our is a company with large amount time series data with milliseconds resolution. The approximate data storage size per day is 150GB. What sort of Big data applications can be used for time series data.
What does a data scientist actually do? Can someone explain it with briefly with an example?
Hello Sgilan! I suggest you check out our definition for data scientist. It discusses roles, skills, responsibilities and much more. 
I am doing Big Data Hadoop course from .. How can diffrenciate the Big Data ans Data Science jobs.?
Need to create a marketing plan to generate sales using Big Data Analytics
wow glad to read this, thank you so much
Data Visualisation is an integrated part of Big data Analytics. It is very important .
Agreed!  Data story telling is almost an art form and visualization plays an important role in sharing information. 
i have gone through the following information looking forward for an example
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Thanks for sharing your experiences with us and keep going on See more Big Data Hadoop