Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
Articles and case studies written about “big data” analytics programs proclaim the successes of organizations, typically with a focus on the technologies used to build the systems. Product information is useful, of course, but it’s also critical for enterprises embarking on these projects to be sure they have people with the right skills and experience in place to successfully leverage big data analytics tools.
Knowledge of new or emerging big data technologies, such as Hadoop, is often required, especially if a project involves unstructured or semi-structured data. But even in such cases, Hadoop know-how is only a small portion of the overall staffing requirements; skills are also needed in areas such as business intelligence (BI), data integration and data warehousing. Business and industry expertise is another must-have for a successful big data analytics program, and many organizations also require advanced analytics skills, such as predictive analytics and data mining capabilities.
Data scientist is a new job title that is getting some industry buzz. But it’s really a combination of skills from the areas listed above, including predictive modeling, both statistical and business analysis, and data integration development. I’ve seen some job postings for data scientists that require a statistical degree as well as an industry-specific one. Having both degrees and matching professional experience is great, but what are the chances of finding job candidates who do? Pretty slim!
Taking a more realistic look at what to plan for, I describe below the key skill sets and roles that should be part of big data analytics deployments. Instead of listing specific job titles, I’ve kept it general; organizations can bundle together the required roles and responsibilities in various ways based on the capabilities and experience levels of their existing workforce and new employees that they hire.
Business knowledge. In addition to technical skills, companies engaging in big data analytics projects need to involve people with extensive business and industry expertise. That must include knowledge of a company’s own business strategy and operations as well as its competitors, current industry conditions and emerging trends, customer demographics and both macro- and microeconomic factors.
Much of the business value derived from big data analytics comes not from textbook key performance indicators but rather from insights and metrics gleaned as part of the analytics process. That process is part science (i.e., statistics) and part art, with users doing what-if analysis to gain actionable information about an organization’s business operations. Such findings are only possible with the participation of business managers and workers who have firsthand knowledge of business strategies and issues.
Business analysts. This group also has an important role to play in helping organizations to understand the business ramifications of big data. In addition to doing analytics themselves, business analysts can be tasked with things such as gathering business and data requirements for a project, helping to design dashboards, reports and data visualizations for presenting analytical findings to business users and assisting in measuring the business value of the analytics program.
BI developers. These people work with the business analysts to build the required dashboards, reports and visualizations for business users. In addition, depending on internal needs and the BI tools that an organization is using, they can enable self-service BI capabilities by preparing the data and the required BI templates for use by business executives and workers.
Predictive model builders. In general, predictive models for analyzing big data need to be custom-built. Deep statistical skills are essential to creating good models; too often, companies underestimate the required skill level and hire people who have used statistical tools, including models developed by others, but don’t have knowledge of the mathematical principles underlying predictive models.
Predictive modelers also need some business knowledge and an understanding of how to gather and integrate data into models, although in both cases they can leverage other people’s more extensive expertise in creating and refining models. Another key skill for modelers to have is the ability to assess how comprehensive, clean and current the available data is before building models. There often are data gaps, and a modeler has to be able to close them.
Data architects. A big data analytics project needs someone to design the data architecture and guide its development. Typically, data architects will need to incorporate various data structures into an architecture, along with processes for capturing and storing data and making it available for analysis. This role involves traditional IT skills such as data modeling, data profiling, data quality, data integration and data governance.
Data integration developers. Data integration is important enough to require its own developers. These folks design and develop integration processes to handle the full spectrum of data structures in a big data environment. In doing so, they ideally ensure that integration is done not in silos but as part of a comprehensive data architecture.
It’s also best to use packaged data integration tools that support multiple forms of data, including structured, unstructured and semi-structured sources. Avoid the temptation to develop custom code to handle extract, transform and load operations on pools of big data; hand coding can increase a project’s overall costs -- not to mention the likelihood of an organization being saddled with undocumented data integration processes that can’t scale up as data volumes continue to grow.
Technology architects. This role involves designing the underlying IT infrastructure that will support a big data analytics initiative. In addition to understanding the principles of designing traditional data warehousing and BI systems, technology architects need to have expertise in the new technologies that often are used in big data projects. And if an organization plans to implement open source tools, such as Hadoop, the architects need to understand how to configure, design, develop and deploy such systems.
It can be exciting to tackle a big data analytics project and to acquire new technologies to support the analytics process. But don’t be lulled into thinking that tools alone spell big data success. The real key to success is ensuring that your organization has the skills it needs to effectively manage large and varied data sets as well as the analytics that will be used to derive business value from the available data.
About the author:
Rick Sherman is the founder of Athena IT Solutions, which provides consulting, training and vendor services on business intelligence, data integration and data warehousing. Sherman has written more than 100 articles and spoken at dozens of events and webinars; he also is an adjunct faculty member at Northeastern University’s Graduate School of Engineering. He blogs at The Data Doghouse and can be reached at firstname.lastname@example.org.
Don’t go wrong -- read Rick Sherman’s list of big data analytics worst practices
Find out why effective data visualization could be crucial big data analytics success
Learn about the potential benefits and challenges of combining big data and predictive analytics