Structuring a big data strategy
A comprehensive collection of articles, videos and more, hand-picked by our editors
What kind of price tag can you put on the value of big data? Try a whopping $300 billion potential annual value...
to the U.S. health care industry, a 60% possible increase in a retailer's operating margins and a $149 billion savings related to operational efficiency improvements for the developed economies of Europe – and that’s just a start.
These big-time examples, touted in a recent McKinsey Global Institute research study called Big data: The next frontier for innovation, competition, and productivity, may seem a tad overreaching, but they point to the scale of what’s possible with effective big-data analytics initiatives. Organizations at the forefront of mining the data pouring in say they are able to unlock insights never possible with traditional data warehouse and business intelligence (BI) tools, and those efforts are proving to be a boon for business.
While there are benefits to traditional after-the-fact BI analysis, big-data analytics paves the way for organizations to comb through the minutiae in myriad data sources to model trends and uncover unknown variables that can have a huge impact on advancing strategy, boosting efficiency and reducing costs.
“When you look at the new proposition and opportunity around truly monstrous data, you tend to look at advanced analytics,” said Shawn Rogers, vice president of research for BI and data warehousing at consulting firm Enterprise Management Associates Inc. “You’re talking about tools that let you build models and algorithms to parse through data, look for patterns and find the nuggets of gold.”
In some sense, what Rogers is referring to is predictive analytics on steroids. Instead of creating algorithms and models using only historical, structured information from a transactional data warehouse, big-data analytics opens up the practice to the sea of unstructured data pouring in from the Web, social media feeds and machine-generated content like that from radio frequency identification (RFID) tags or sensors.
Mining real-time data from Twitter feeds and mashing it up with brick-and-mortar sales information can deliver critical market intelligence, helping companies understand trends and consumer sentiments, predict churn, respond to customer issues and tweak strategies in a more timely fashion. At the same time, tracking product movement on a global basis with RFID tags allows retailers to understand purchasing behavior, helping them to better manage inventory, optimize product portfolios and prepare for peak buying cycles. And by capturing and evaluating every detail of data emanating from complex machinery, manufacturers can correct defects before they become problematic to customers, while allowing the companies to improve product quality over time.
The best of both worlds
While applying such forward-thinking analytics to big-data sets has obvious appeal, the practice doesn’t come easy and it doesn’t supplant the role of traditional data warehouse and BI efforts, experts say. On the contrary, most companies in the early and even mid-stages of big-data analytics adoption are combining both technologies when and where it makes sense.
“There are places for the traditional things associated with high-quality, high-reliability data in data warehouses, and then there’s the other thing that gets us to the extreme edge when we want to look at data in the raw form,” explained Yvonne Genovese, a vice president and distinguished analyst at Stamford, Conn.-based Gartner Inc.
In the hybrid scenario, companies will continue to push conventional BI out to mainstream business users to do ad hoc queries and reports. They will supplement that effort with a big-data analytics environment optimized to handle a torrent of unstructured data and tuned to the needs of data scientists for building complex predictive models. For example, an analytics sandbox might archive three to four years of raw information used for pattern matching and modeling.
The environments don’t necessarily have to be information silos, either. In fact, an increasingly popular model is to employ the open source distributed file system Hadoop in particular, as a “dirty” operational store for extreme extract, transform and load (ETL) processes.
“Hadoop is not only an interesting analytic tool, it’s an interesting ETL tool,” said Marcus Collins, a research director at Gartner. “One model that is becoming quite popular is for organizations to take the fire hose that is Twitter, triage the data using Hadoop and put the aggregated results back into the data warehouse for further analysis.”
In some ways, creating a blend of the two environments brings companies closer to where they’ve been trying to go with traditional data warehousing for some time.
“This augments the historical views of where we’ve been with predicting where we want to go,” Rogers said. “That has always been the holy grail of BI and what we’ve always wanted.”
ABOUT THE AUTHOR
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for 25-plus years for a variety of trade and business publications and websites.