Data scientists are a new type of analyst: part data engineer, part statistician and part experienced business analyst. And they're in high demand: Companies are combing through resumes and job websites, interviewing recent university grads and poaching from their competitors in an effort to bring these new talents into their organizations.
Of course, we've had statistical analysts -- or "quants" -- in our organizations for years. They're often doctorate-bearing, white-jacketed folks who spend their days in the rarified towers of the back office. Unfortunately, while they are great at analyzing data, they are not always the best at explaining their findings to corporate executives and lower-level business workers in understandable terms. In many cases, it is nearly impossible to get clear answers from them to questions such as the following:
- I see the trends, but why are they important and what should I do to change them?
- What impact will changing customers' buying behaviors have on our revenues?
- How do we change the causes of negative effects to produce more positive results?
These experts also find it difficult to access the right data in the right format and at the right time to perform their analytical explorations and investigations. Often, they have to rely on IT to provide the data extracts, the supporting analytical technology and the high performance systems infrastructure their jobs depend on. Many still find it frustrating to have an intermediary between them and "their" data.
Enter the data scientists
Data scientists may be the people who can finally bridge the gap between doing advanced data analysis and using the findings of that analysis to produce business results that align with an organization's strategic goals. But what exactly is a data scientist?
To answer that question, we first need to understand the different "information workers" found in our enterprises. These employees can be segmented into three broad categories:
- Business intelligence (BI) and data warehouse builders. Traditionally, these are the people responsible for designing and implementing BI systems. However, because of budget, resource or priority issues, they are often perceived as the bottleneck in deploying BI and analytics tools. They typically come from central or line-of-business IT staffs or are technologically savvy business users.
- Information consumers. These workers use and apply BI results to support day-to-day business operations. They need BI and analytics findings to increase knowledge and help them make sound business decisions. But they rarely have the experience or inclination to create the required information themselves.
- Information producers. These are the folks who generate BI results for information consumers. Information producers identify potential new business opportunities, analyze or investigate data, and create actionable BI and analytics models. We often know them as power users or business analysts, but increasingly they are also the much-sought-after data scientists.
One of the pundits promoting the use of data scientists in companies is D.J. Patil, former chief scientist and head of data products at LinkedIn Corp. and now a "data scientist in residence" at venture capital firm Greylock Partners. In his book, Building Data Science Teams, Patil says good data scientists have a combination of technical expertise (deep proficiency in some scientific discipline), curiosity (a desire to discover problems and distill them into a clear set of hypotheses that can be tested), storytelling ability (a knack for using data to tell its story and communicating that story to others) and cleverness (the ability to think outside the box and approach problems in creative ways).
We have consolidated data scientist skills into three categories for further clarification:
- Business knowledge. Data scientists must be subject-matter experts with strong investigative capabilities.
- Modeling and analysis skills. They also must be trained in areas such as statistics, machine learning and data visualization and be able to create the models and programs needed to perform data analysis activities.
- Data engineering skills. In addition, data scientists must be adept at data engineering, including the ability to mash up or blend large amounts of data.
You may ask: Why the sudden interest in data scientists? Perhaps it is because we can now do things with our data that were not technologically possible before. Certainly, the introduction of "big data" has spurred innovation in many areas, including data storage and analytics applications and infrastructures. We now have analytical platforms that can be used to store and analyze massive sets of data, not just small samples or subsets of information. We also are seeing advances in applications that allow very sophisticated analyses to be performed with ease.
We believe that the meteoric rise of the data scientist has resulted from these improvements, and that companies worldwide can benefit from the incorporation of these highly skilled, and potentially highly valuable, employees into their ranks.
About the authors:
Claudia Imhoff, Ph.D., is president of Intelligent Solutions, a consultancy that focuses on business intelligence technologies and strategies. Imhoff serves as an adviser to many companies, universities and leading technology vendors. She is also the founder of the Boulder BI Brain Trust, a consortium of independent analysts and consultants.
Colin White is founder of BI Research and president of DataBase Associates Inc. As an analyst educator and writer, he offers advice on data management, information integration and business intelligence technologies and how they can be used to build smart and agile businesses.