Managing Hadoop projects: What you need to know to succeed
A comprehensive collection of articles, videos and more, hand-picked by our editors
McDonald’s Corp. introduced a new photo system to inspect color, shape and even sesame seed distribution of its hamburger buns; the United Nations has turned an eye toward Twitter, using it as an early warning system for war, impoverishment and health catastrophes; and Heritage Provider Network Inc. has collected 10,000 model submissions to better predict patient readmissions.
Deciphering how these organizations are connected may seem like a less than intuitive task, but a shared characteristic ties them together: They are successfully taking on one of the biggest buzzwords in the industry today. Because of that, all three made cameo appearances in a recent Gartner Inc. webinar called “Gaining Value from Big Data.”
“Hype isn’t always a bad thing,” Doug Laney, a research vice president for the Stamford, Conn., consultancy, said during the presentation. “It often acknowledges something interesting or important.”
More on big data projects
Clickstream data edges retailer into big data analytics territory
Federal government to invest in big data for science, technology R&D
Worst practices in big data analytics programs
Laney’s discussion veered away from the more Hadoop-based conversations that have also grown in popularity. Instead, he focused on what big data is and why it’s such a challenge.
What is big data?
Big data leapt from back burner to bubbling-pot status because of what Laney refers to as “the seven C’s” -- or the convergence of several analytics and business intelligence (BI) trends.
Laney’s seven includes the increase of consumerization, or the blending of personal and business use of technology; a changing climate, where data usage has become a competitive differentiator; a culture shift from the art to the science of making decisions; a corporate belief that data-driven decisions are strategic; a push for more collaboration; enhancements to analytic capabilities and technology that make things like data integration and processing more efficient; and the expanse of content -- especially unstructured content.
Although intersecting trends may explain how big data became a big deal, simply defining the term has become a pain point for the industry. Vendors describe big data one way and analysts another, leaving more than a handful confused. Today, many consider the three V’s -- or growth in volume, velocity and variety -- to be the heart of big data.
“While these remain the core of what we believe are the main challenges,” Laney said, “we’ve gone on to identify a whole dozen different sets of characteristics.”
According to Laney, a more thorough description includes not only what’s happening to data, but also how businesses turn data into intelligence. That means taking into account the new types of hardware needed to process big data so businesses can gain insights, make decisions or automate processes.
“This [description] acknowledges both the challenge and the value of big data,” he said.
Technology and beyond
Today’s data environment is further complicated by the number of places business can source data. Laney claims that number has grown more than five times when compared with 10 or 15 years ago.
“Most of the data is available at either no charge or at a nominal expense,” he said.
He includes data already part of the ecosystem that could be used to build better relationships with customers, partners, employees and suppliers; third-party data such as credit ratings; public data from, say, the government; social media data; and dark data or, according to Laney, data that has served its primary purpose and is now being stored for compliance reasons, long forgotten.
Laney recommends businesses begin to take advantage of new -- and not so new -- data sources by starting with the untapped data within the organization and then moving on to other data sources, which can yield new insights
“Taking it on faith that there was some value here may have been OK when we were dealing with smaller data,” Laney said. “Now we need to think about driving and measuring information value.”
According to Gartner, 85% of Fortune 500 organizations will be unable to exploit big data for competitive advantage through 2015.
To do so, businesses will need to consider how they’ll store and manage data, which will most likely require an investment in new technology. Today, much of the big data discussion is centered on that technology, like Hadoop, but businesses will be faced with questions that delve beyond storage needs if they’re going to discover insights, Laney said. He recommends that businesses also consider strategies for data federation, data virtualization, self-service analytics and master data management.
Companies taking on increasing volumes of data will also face an increased risk of mishandling sensitive information. Laney believes businesses should consider data governance programs and data stewards to help define policies and procedures. He even recommends purchasing electronic data insurance.
Management issues aside, businesses will also need tools to analyze that data, which could require further investment in both software and skills. Both, Laney said, will push the business beyond the boundaries of basic BI.
“We see leading organizations looking beyond basic business intelligence to consider predictive analytics, data text, even multimedia mining,” he said. “And increasingly illustrative and layered forms of visualizations, complex event processing, rule engines and natural language queries.”
According to Gartner, business analytics needs will drive 70% of investments in the expansion and modernization of information infrastructure through 2015.
But for a big data environment to really thrive, businesses will need strong leadership, something that Laney considers to be just as important as technical skills.