Mercutio arrives on stage in William Shakespeare's classic Romeo and Juliet with tornadic energy. During his first scene, he goes head-to-head with Romeo's melancholy, using witty remarks, puns and a famous speech about Queen Mab
The term "big data" has followed in Mercutio's footsteps: It swept onto the analytics and business intelligence (BI) scene with significant force and frenetic energy, usually accompanied by its catchy "three V" (volume, velocity and variety) tagline. It's been embraced in a way most experts agree is rare, and along with it came new technologies and techniquesand even business models.
Some of the same experts who acknowledge the industry's love affair with big data also say that the buildup and excessive promotion have led to flawed thinking. Data, for example, has always been "big" and it always will be, they say. Big data is commonly regarded as a data management issue, they say, and it's what you do with the data -- the analytics -- that really matters.
Colin White, founder and president of Ashland, Ore.-based consultancy BI Research, and Harriet Fryman, director of business analytics software for IBM, aimed to address some of that incomplete thinking during their presentation at last month's 11th annual Pacific Northwest BI Summit.
"That's the problem: All the use cases we present are the eBays and the Yahoos and the petabytes and all of the Vs. But, to me, that's a small percentage [of use cases]," White said. "The key message is that it's not just about volume."
Big data defined by two roles
Early discussions about big data focused on data management issues. The more traditional approach of flowing data into an enterprise data warehouse (EDW) struggled to keep up with demand, White said. Some analysts suggest big data eventually will lead to the demise of the EDW, but summit attendees tended to embrace a more conservative outlook. Technologies that can better address some of the big data challenges -- analytic relational databases, nonrelational databases like Hadoop, and streaming process systems -- will sit alongside and work with the EDW, according to White. "People want to store more data for longer, but they can't afford to do it," he said. "One of the biggest use cases of big data is the data hub."
More about 'big data'
Mission impossible? Data governance processes take on big data
Big data analytics 101: Practical advice on getting started
Learn more about big data analytics for decision management
Big data analytics: Hype or the real thing?
In these cases, businesses that want access to around 10 years of detailed data will split it between an EDW that stores the most recent data, and a second platform, such as Hadoop, that can store older data pretty cheaply, according to White.
Big data discussions, though, have sometimes failed to move beyond data management, beyond volume, White said, and they need to. Big data isn't defined solely by how data is managed; it's also about the insights businesses can glean as well. "It's the analytical issues we should be addressing, more than the management issues," he said. "It's what you do with the data that matters."
Analytics is vital, IBM's Fryman agreed, and like the data management piece, big data also is affecting how businesses approach analytics. Moving data from one place to another, for example, is not an efficient model when a company is working with large volumes of information or streaming data, she said. "The way we assemble information actually changes when we look at the data of many forms," she added.
The business analyst (the person who knows what kinds of questions to ask of the data or how to look at it) will need to take a more prominent role when processing the data. That's because big data questions tend to be business questions, "not structural or data management-style questions," Fryman said. Businesses also will need algorithmic applications to help sift through the data and distinguish between what's relevant and what's expendable. These applications will become even more vital than visualization tools, which have been called the key to big data analytics and which received a warm nod during White and Fryman's summit presentation. Technologies capable of handling bigger, faster, more distinct types of data make for a dense and complex environment that could become hard to visualize, she said.
"For a human to decide things, you're going to have to apply algorithmic analytics to be able to surface the relevant, the most important, the interesting information," Fryman said. "When we look at analyzing sensor data or social data, I would say there's a lot more volume with a lot less interest in it or value in it. It's finding the patterns in it that's important."
Data management + analytics
Coupled, data management and analytics can help businesses strike out in new directions, according to Fryman and White. Because they combine traditional data with new data sources to ask new questions, they put analytics into the hands of new users and uncover new opportunities. "This is where those of us in Silicon Valley see a ton of startups who, I would pose, would not be in business if it wasn't for their abilities to analyze big data," Fryman said, pointing to Mountain View, Calif.-based SST Inc. as an example.
A small business founded in 1995, SST specializes in gunshot detection technology, which has been deployed by police departments and government agencies. In essence, SST's product is a data service that hinges on its ability to consume and analyze data in real time, identifying gunfire and sounding the proper alert, Fryman explained. "The information itself is what [they're] producing," she said of these kinds of businesses in general. "It's not a byproduct, it's a coproduct."
But the misuse -- or overuse -- of the term big data has become a sticking point for the industry, summit attendees agreed. And it overshadows the bigger picture: The crux of big data isn't that it's something new; rather, it's that it extends what businesses are doing already. Big data is part of an evolution, experts say.
"This is important: We've got to lose this that it's all about multistructured data and volume," White said. "Many of these capabilities improve what we do today."