Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
With new terms, new skill sets, new products and new providers, the world of “big data” analytics can seem unfamiliar, but tried-and-true data management best practices do hold up well in this still-emerging discipline.
As with any business intelligence (BI) and data warehouse initiative, experts say it’s critical to have a clear understanding of an organization’s data management requirements and a well-defined strategy before venturing too far down the big data analytics path. Big data analytics is widely hyped, and companies across all sectors are being flooded with new data sources and ever-larger amounts of information. Yet, making a big investment to attack the big data problem without first figuring out how doing so can really add value to the business is one of the most serious missteps for would-be users.
“Don’t get too hung up on the technology -- start from a business perspective and have the conversation between the CIO, data scientists and businesspeople to figure out what the business objectives are and what value can be derived, and drive backwards from there,” said David Menninger, an analyst at Ventana Research Inc. who focuses on BI, analytics and information management technologies.
Defining exactly what data is available and mapping out how an organization can best leverage those resources is a key part of that exercise. CIOs, IT managers and BI and data warehouse professionals need to examine what data is being retained, aggregated and utilized and compare that with what data is being thrown away, Menninger said. It’s also critical, he added, to consider external data sources that are currently not being tapped but could be a compelling addition to the mix.
Even if companies aren’t sure how and when they plan to jump into big data analytics, there are benefits to going through this kind of an evaluation sooner rather than later, according to Menninger. And beginning the process of capturing data can also make you better prepared for the eventual leap. “Even if you don’t know what you’re going to use it for, start capturing the information,” he said. “Otherwise, there is a missed opportunity, because you won’t have that rich history of information [to draw on].”
Start small with big data…
Analyzing big data sets is yet another instance where it makes sense to define small, high-value opportunities and use them as a starting point. As companies expand the data sources and types of information they’re looking to analyze, and start to create the all-important analytical models that can help them uncover patterns and correlations in both structured and unstructured data, they need to be vigilant about homing in on the findings that are most important to their stated business objectives.
“If you end up in a place where all you’re doing is looking for new patterns and you can’t do anything with them, you’ve hit a dead spot,” said Gartner Inc. analyst Yvonne Genovese.
ComScore Inc., a Reston, Va.-based company that tracks Internet usage and provides Web analytics and marketing intelligence services to corporate customers, knew early on that it would need some sort of big data strategy. But it picked very targeted spots and built out its big data analytics program over time.
“We started with small bites -- taking individual [data] flows and migrating them into different systems,” said Will Duckworth, comScore’s vice president of software engineering. “If you’re working with any kind of scale, you can’t roll something like this out overnight.”
Scale is something comScore is very conscious of, given the amount of data the company processes. Back in 2009, when it started collecting 300 million records a day, Duckworth began searching in earnest for a new set of systems and a technology infrastructure that could handle comScore’s data processing needs -- now totaling 23 billion records a day and still growing -- in a far more cost-efficient fashion.
…but don’t forget to think big
Leveraging open source Hadoop technologies and emerging packaged analytics tools, Duckworth has been able to make the open source environment more familiar to business analysts trained in using SQL. He says companies need to consider scale as a primary factor when mapping out a big data analytics roadmap.
“You have to consider what the ramp-up will look like -- how much data will you be putting in six months from now, how many more servers will you need to handle that, is the software up to the task,” he explained. “People don’t think about how much it is going to grow or how popular the solution might be once it’s rolled into production.”
The other thing companies commonly lose sight of as they get enveloped in the “new normal” that is big data is that the “old normal” rules around data management still apply.
“Information governance practices are just as important today with the notion of big data as they were yesterday with data warehousing,” said Marcus Collins, another Gartner analyst. “Even though companies want flexibility in terms of processing, remember that information is a corporate asset and should be treated as such.”
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for 25-plus years for a variety of trade and business publications and websites.