kosmin - Fotolia

Emerging analytics tools challenge dominant big data philosophy

Analytics technologies like the internet of things and cognitive computing mean we cannot have all the data on a given subject. So, what does that mean for big data?

There is a view of big data that says with large enough data sets, statistical methods of analytics are unnecessary. Call it the N = all philosophy. Sample and inference are a waste of time, it says. We have all the data. Just let the data speak.

While the N = all big data philosophy was revolutionary just a few short years ago, it's quickly becoming outmoded as new and potentially more valuable analytics methods are coming online. Internet of things (IoT) analytics and cognitive computing are making the notion of having all the data for a given subject seem quaint, challenging the popular perception of big data and demanding that analytics professionals re-evaluate their practices.

An early formulation of the N = all framework came in 2008 -- the dawn of the big data age. Writing in Wired, Chris Anderson talked about examples where petabyte-size data stores were making answers apparent in areas like advertising and biology. Large enough data sets meant researchers don't even need to formulate questions or hypotheses. The numbers speak for themselves. But things are not so straightforward when you talk about emerging analytics technologies.

IoT precludes having all the data

When it comes to IoT, the technology's very nature precludes ever having all the data. It involves a constant stream of incoming information that is refreshed every second. And rather than seeking to acquire a critical mass of data in which answers to retrospective questions become apparent, you're looking for a signal that will tell you something about the moment.

The most effective IoT strategies recognize how different the technology is from traditional perspectives of big data. Edge analytics has become a critical component of success with IoT. This involves embedding statistical algorithms in networking devices and sensors at the edge of networks that compute data as it's created, making decisions on what data to send back to centralized data stores and what data to throw away. In this scenario, more data is just a burden. You would never want all the data. We're back to using statistical methods to determine what data to use.

Cognitive computing offers different challenge

The challenge to the dominant big data philosophy presented by cognitive computing is somewhat different. In cognitive computing -- whether we're talking about artificial intelligence (AI) or deep learning -- there is no point at which you can ever have enough data. Algorithms improve through experience, and the more training they receive, the better they will perform.

Take, for example, AlphaGo, Google's Go-playing AI algorithm. It first learned to play the chess-like game by ingesting a library of 30 million game moves made by human players. This served as the algorithm's initial training, but it wasn't enough. It then played thousands of games against itself, improving with every match. Eventually, it improved to the point where it was able to beat every human master of the game.

But, theoretically, the algorithm could still optimize itself through continued learning. You could never say it has learned all there is to know about playing the game. The same goes for other deep learning exercises like speech recognition, computer vision and natural language processing. Humans -- the closest analogue to cognitive computing algorithms -- continue to learn ways to communicate with each other and describe the world around them throughout their lifetimes. There's no reason to think that an algorithm could ever acquire all the data needed to perform these tasks optimally.

Time to rethink nature of big data

There was a time when the N = all big data philosophy was considered the ultimate value proposition. The capability to survey entire data sets seemed to offer the ability to ask specific questions and receive specific answers rather than having to rely on statistical methods that necessarily involve a measure of uncertainty.

But, increasingly, the real value for enterprises and their ability to differentiate themselves in their markets is going to come from these emerging analytics trends. Businesses that want to stay ahead of the curve will need to rethink what big data is all about.

Next Steps

Healthcare providers take advantage of new age of big data

Why more data is better when it comes to customer analytics

The benefits of big data have yet to trickle down to smaller firms

Dig Deeper on Big data analytics