olly - Fotolia

Jobs in data science may seem glamorous, but they require dirty work

The role of a data scientist is often seen as one of today's most glamorous and exciting jobs, but behind the glitz and acclaim are a lot of toil and hard work.

It was nearly five years ago that author and speaker Tom Davenport and high-profile analytics manager D.J. Patil...

jointly declared data scientist the sexiest job of the 21st century, and their prediction is looking more accurate by the day.

Data scientists are behind some of today's hottest technologies, like self-driving cars and artificial intelligence, and businesses hope to cash in on these transformative technologies.

But just because jobs in data science are exciting in a general sense doesn't mean every task in the day-to-day responsibilities of data science is all that rousing.

"Every fantastic company that is currently discovering data science right now is going to have a hard time figuring out what data scientists are here to do," said Angela Bassa, director of data science at iRobot Corp., a maker of house-cleaning robots based in Bedford, Mass.

In a presentation at the recent Open Data Science Summit in Boston, Bassa said that companies often are filling jobs in data science even though they aren't yet sure what projects they want people to work on. This means many newly hired data scientists find themselves working on things like optimizing legacy systems or processing ad hoc data requests for lines of business. That isn't the sexy stuff Davenport and Patil envisioned data scientists doing back in 2012, but it can be important.

A new twist on old applications

Rather than turn their noses up at these types of jobs, data scientists should put their mark on them, Bassa said. Optimizing legacy applications by embedding modern data science may not help an enterprise create a new business model, but it can improve day-to-day operations significantly.

"The best work happens when teams improve and productize legacy applications and make older things more innovative," Bassa said. "Why not innovate on these applications?"

Another important, yet less exciting, aspect of jobs in data science today is maintaining data quality. In some cases, this will fall to data engineers, but enterprises who look for the unicorn data scientists -- those with the full combination of statistical, technological and business domain skills -- may expect them to take responsibility for this area. There's also the fact that modern machine learning practices demand specific types of data, so data scientists should have a hand in ensuring their data is right.

"Data quality is critical to us, and really to any machine learning application," said Jasjeet Thind, vice president of data science and engineering at real estate listing site Zillow, in another presentation at the conference.

Good data science starts with good data

Seattle-based Zillow uses machine learning algorithms for things like creating personal listing recommendations for users, ad targeting, calculating mortgage pricing and forecasting housing trends. One of its primary machine learning use cases is Zestimate, a proprietary model that estimates how much a home should be worth based on a variety of features in a listing. The model incorporates a deep learning feature that views listing images to identify the condition of a property.

Thind said good data is crucial to all of this. Working on recommendation engines and deep learning computer vision models may be the exciting stuff, but you can't start those projects if you don't have high-quality data. So the company's data science team maintains its own internal analytical models that review data sets for outliers, missing values and other potential defects. These models then alert data owners whenever something is amiss.

"With machine learning, you're talking about petabytes of data, streaming data, batch data, and you need to be able to detect problems in your data at scale," Thind said.

Next Steps

Data science jobs demand a range of different skills

Data science walks a fine line between innovation and business value

Citizen data scientists help compensate for a lack of true data science

Dig Deeper on Advanced analytics software