Tommi - Fotolia
There isn't one type of data scientist. Some started out as developers, others IT analysts, some are jack-of-all-trade data creatives and others are businesspeople turned data junkies.
That variety was on display at the TDWI Accelerate conference in Boston recently, where software developers, business analysts, IT pros, database administrators, an astrophysics major and even a lawyer converged to learn about data scientist skills, as well as the strategies they'll need to work in that role -- or at least to understand the growing number of people who do.
"If I'm going to work with the data science team, I need to know what they are talking about. I want to understand the concepts, and I want to be able to call BS from time to time," said David Kleiman, manager of litigation product development for a major legal information provider.
Kleiman's role in productization led him toward data science, as more and more clients want litigation analytics to be part of the e-discovery process to predict possible outcomes of cases. To get there, data scientists at the legal information company collect data used to predict litigation outcomes with a high level of accuracy: the judge, the law firm, the parties involved, the court location and many, many other variables. Kleiman explained that they need to consider if the data is clean, what's not there, and they have to get it all in the right format for analysis.
The data science grunt work
Indeed, anyone involved in data science projects knows much of the work involves wrestling the data into submission. There's so much data, in so many forms, from so many sources, but also, so much is missing.
Data scientist, with a median salary of $110,000 and over 4,000 job openings as of January 2017, ranks as the No. 1 job in America by Glassdoor.com.
"Ninety percent of data science is ETL [extract, transform and load] -- the data wrangling aspects," said Michael Li, founder and CEO of The Data Incubator, a data science training organization, during a presentation at TDWI Accelerate. "Focusing just on the analytics layer is not going to deliver the value and the insights you are looking for."
Citizen data scientists who think it is as simple as feeding data into one of the many black boxes out there and turning whatever comes out the other side into colorful charts and graphs need to think again, data experts at TDWI said.
"You need to be sure the values make sense before you throw them into an algorithm," said Chloe Mawer, a senior data scientist at Silicon Valley Data Science, who presented a session on exploratory data analysis (EDA) at the conference.
Mawer said EDA is the first part of data science projects, and it is considered the most critical step because it ensures the data analysis will be as accurate as possible. EDA also allows you to identify patterns and develop hypothesis, test technical assumptions, and it will help you choose which predictive model to use. EDA also builds an intuition for the data, so that a data scientist can later identify erroneous data, she explained.
For business stakeholders, EDA ensures the results are technically sound, and ensures the right questions are being asked. It tests business assumptions, provides context and leads to insights. The mindset during EDA should be open, questioning and receptive to information that goes against expectations.
The data science mindset
The ability to push against corporate expectation requires leadership -- a key data scientist skill for defining best practices and driving the culture toward a data-driven decisions approach. Establishing goals and KPIs is also critical, Li said.
Data scientists must also be able to step back and identify data biases; where the data came from, who the data sources are and, just as importantly, what isn't included in the data.
"Think about customers you have lost, and what you might have done to keep them," Li said.
Another critical data scientist skill is the ability to articulate. They'll need to get out from behind the computer screen to communicate their findings to other business units across the enterprise.
Experts advise companies to embed data scientists within business units, so every group benefits from data, and there is business context against the data sets. "Having that feedback is critical in driving a data culture," Li said. "You don't want to create an ivory tower of experts."
Why physicists make good data scientists
Businesses explain why they need data scientists
Test your data science skills knowledge