The biggest challenges related to “big data” analytics, according to consultants and IT managers, boil down to a simple one-two punch: The technology is still
A lot of big data technologies -- like Hadoop and MapReduce -- hail from the open source world, developed by Internet pioneers such as Google and Yahoo to take on the problem of cost-effectively processing large volumes of information, including both structured and unstructured data. As a result of this orientation, most of the technologies lack the maturity and accessibility of traditional databases and data management suites, and there is still a limited selection of complementary analytics tools available to make these environments feel familiar to many data warehousing and analytics professionals.
“There’s a steep learning curve to all this, with a lot of new technologies and unwritten lore as to how to make things work,” said Ron Bodkin, CEO of Think Big Analytics, a Mountain View, Calif.-based consulting firm specializing in big data analytics. “The majority of people are used to working with relational database management systems, which have a different model of storing and processing data.”
While data management teams typically have a well-defined set of expertise around managing and organizing highly structured data and modeling and creating reports in SQL, those conventional skill sets don’t translate well to the unstructured, flat-file part of the big data world, where command lines and NoSQL database technologies are the core building blocks of most of the emerging platforms.
“You have to be willing to get your hands dirty,” said Will Duckworth, vice president of software engineering at comScore Inc., a Reston, Va.-based provider of Web analytics and marketing intelligence services that has developed and implemented a big data analytics strategy in recent years. “This isn’t a fully shrink-wrapped product where you open the box, install it on servers and it runs fine. You need a good set of system administrators and solid practices around how to build out these environments.”
Bring on the Ph.D.s
Much of what big data analytics brings to the table is based on predictive modeling or a look into future trends. But the discipline of developing the models for predictive analytics applications isn’t within the skill set of the average business user or even the traditional business intelligence (BI) data analyst. In addition, much of the data is in a raw form, from sources such as Web activity logs or sensors. Thus, companies need access to a cadre of experts who are versed in statistical and mathematical principles to build advanced analytical models that can uncover trends and hidden patterns and actually make big data useful.
“Not only do you need the IT operational skills to be able to realize value, the biggest shortage we see around big data is data scientists -- people with Ph.D.s in statistics,” said Brian Hopkins, a principal analyst at Forrester Research Inc. in Cambridge, Mass. “Most of the data is raw -- it’s not something you can read and get value out of. There will always be a need for a skill set of people who know what to do with the raw information, and you have to build the acquisition of talent into the business case.”
At comScore, where the company’s business model is predicated on crunching through volumes of Web data to unearth trends for customers, many analytics users are trained in predictive modeling and are also technically savvy enough to understand the impact of a particular query on overall system performance. Others, however, didn’t possess that level of expertise, Duckworth said. So comScore has invested time and money in re-education efforts to orient them to think about the scale of the data and to spend time considering such details as data partitioning and load size when they’re building models and queries.
At the same time, the company has designed its big data system with checks and balances. For example, if someone tries to run a query that could potentially crash the cluster, the system pops up a note to ensure that the user is fully aware of the ramifications of the planned job. “At scale, things break pretty fast,” Duckworth said.
ComScore has also brought in a packaged application that adds a SQL-like environment to its Hadoop big data analytics environment, so it feels more familiar to mainstream users.
Training was also an integral part of the big data analytics strategy for Zions Bancorporation, a commercial bank holding company based in Salt Lake City that has deployed big data technology to help it do modeling and risk management for various loan portfolios. Yet the training wasn’t just about learning Hadoop skills or serving as a crash course in statistical science. Rather, a considerable amount of time and energy went into acclimating members of the technical team so they were able to comfortably transition to a totally new way of managing data.
“This is new technology that traditional and very conservative IT shops may be reluctant to implement,” said Clint Johnson, who until recently was senior vice president of data warehousing, BI and analytics at Zions.”You have systems administrators or database administrators who’ve built an entire career around a particular skill set, and then you thrust some new technology at them and say they have to learn it. There are cultural challenges you have to deal with in terms of supporting the new model.”
Beth Stackpole is a freelance writer who has been covering the intersection of technology and business for 25-plus years for a variety of trade and business publications and websites.