By now many data management professionals are familiar with the "three Vs" definition of big data -- volume, velocity...
and variety. But as the various types of databases become increasingly more proficient at handling large volumes of streaming data, the last V, variety, may become the trickiest piece of the big data puzzle to solve.
Speaking at the SAS Premier Business Leadership Series, David Judson, senior director of business intelligence initiatives at Scotts Miracle-Gro Company, said when he took over responsibility for the organization's big data initiatives in 2011, no one was concerned about data volume or the speed with which it was coming into databases. The existing infrastructure was sufficient to handle the load.
Variety is the key piece of it to think about.
Tom Davenport, co-founder and research director at the International Institute for Analytics
However, he was concerned about data variety. When looking to build customer profiles, Judson found that indicators of who might be ready to buy Scotts' products were buried in things like local weather reports, social media chatter and other forms of online content. It's a problem he's still trying to solve. A lot of the intelligence from this data never gets put in a format that can be stored or analyzed.
Scotts has a traditional enterprise data warehouse, Judson said, but it is currently looking at how Hadoop might be used to store and analyze mixed-media data.
The anecdote points to the problem with the traditional definition of big data. It suggests the challenges of dealing with large data sets mainly involve scale. But in truth, the toughest problems come from reconciling different types of data that may be found in large data sets.
Tom Davenport, co-founder and research director at the International Institute for Analytics in Portland, Ore., said most large companies are concerned with data variety. He recently completed a report, titled Big Data in Big Companies, in which he and his fellow researchers reviewed the analytics practices of major corporations, such as GE, UPS and Citibank. These companies rarely mentioned data volume or velocity as top concerns, he said.
This shows the most common definition of big data is somewhat unhelpful, Davenport said. It fails to adequately capture what corporations should be thinking about when they are looking at implementing big data technology. He believes the term will eventually be replaced by something more specific, but it's not clear yet exactly what that will be.
Learn more about big data definitions
Read this description of big data analytics
See what big data looks like from these case studies
Read about big data's role in developing predictive models
"Variety is the key piece of it to think about," Davenport said.
Jill Dyche, vice president of best practices at SAS Institute and Davenport's co-researcher, said the three Vs definition of big data served its purpose in broadly defining a new technological concept in a way that helped people get their heads around it. But at this point, the term has outlived its usefulness, and it is nearly time for it to be replaced by something else that more adequately captures the essence of the challenges presented by large data sets, she said.
However, Dyche said the term big data may be worth keeping around for a little longer simply because it has a tendency to pique the interest of executives. They may not know exactly what it means, but Dyche said they've often read a magazine article about it or seen some other pop-culture reference, which leads executives to conclude it is something they need.
"I think it's been fortuitous in that executives are now paying attention to data," Dyche said. "The fact that executive management is coming to people and saying, 'We need to be doing this,' is sort of a sea change."