Hal Varian, chief economist for Google, may want to add another entry onto his CV: author of the quote heard ‘round the high-tech industry. In 2009, Varian told McKinsey Quarterly: “I keep
saying the sexy job in the next 10 years will be statisticians.” Since then, the language has been reproduced on websites, at conferences and reaffirmed by Varian himself.
But Varian, an emeritus professor in the School of Information, the Haas School of Business and the Department of Economics at the University of California at Berkeley, didn’t mean statistician in the traditional sense. Instead, he was referring to an emerging role in analytics: the data scientist.
While the title and definition of a data scientist are still being debated, the time is ripe for businesses to explore their rich data sources and discover new correlations. And Google’s data scientists are doing just that. In 2008, the company launched “Google Flu Trends,” relating search queries to the number of people experiencing flu-like symptoms each day. In 2009, Varian and co-author Hyunyoung Choi released a paper on predicting initial claims for unemployment benefits.
SearchBusinessAnalytics.com recently sat down with Varian to talk about the dawning of the data scientist.
How would you define the term data scientist?
Hal Varian: That’s a tough question because it’s an evolving term, but I would say a specialist in data analysis.
What skill set does a person need to be considered a data scientist?
Varian: Database and data manipulation or how to shuffle data around and move things from place to place; statistics and statistical analysis; machine learning; visualization, or how to present data in a meaningful way; and communication or being able to describe what’s going on.
One of the qualities we’ve heard discussed in reference to this title is the ability to tell a data story, what some might even call an art. Why is this so important?
Varian: I think the point is not looking at data just for its own sake, but looking at data to help review decisions or how some process is functioning. There’s a saying that Mother Nature does not give up her secrets easily. That’s also true of data; you need to dig into it more and find a richer, more elaborate story.
When did you see the term emerge and what would you say has caused its emergence?
Varian: I did an interview with the New York Times about two years ago. I was talking to one of the reporters there about interesting tech trends, and I said the statistician is the dream job of the next decade. Inside Google, our whole role is manipulating information and data. It’s not just statistics-- that’s a piece of it-- but other skills have great importance as well. We were looking for terms to describe that set of skills … data scientists has emerged as one those. We’ve always had a specialty in database and data manipulation, but this is putting the whole package together.
Is the rise of the data scientist intimately tied to the growing discussion of “big data?”
Varian: Oh, absolutely. In fact, one of the things companies have done over the last decade is put together data warehouses so they have a way to access data and internal applications. They’re realizing they have to invest in the human side of that as well. Now that we have all this data, how do we analyze it, how do we utilize it? Big data is kind of the driver of this.
It feels as though the data scientist is connected to companies like Google, Facebook or LinkedIn -- businesses that have a strong Internet presence. Is that coincidence?
Varian: Take a company like Walmart, which is noted for its use of data analytics skills and logistics management. I would say that’s data science as well in the retail industry. You want a way that the data can be collected automatically. It’s already true of the Internet, but it’s also true if you look at retail point-of-sale data devices, purchases recorded and authorized. The supermarket scanner, that’s another device. There are a lot of other businesses where data science is important, but it may not go by that name.
Open source tools have been mentioned by other data scientists we’ve spoken to. What’s the draw?
Varian: Open source has advantages in education. You don’t have to worry about getting licenses, students are exposed to it, and it allows for the dynamic development you see in lots of open source areas.
There appears to be a need and desire to hire more data scientists, but a lack of people who can fill these positions.
Varian: Well, I’ll tell you, there’s a scrambling going on to meet that need, especially given the financial situation at universities these days. They’re generally setting up disciplinary programs … they can put together a good program out of the existing skills [computer science, statistics] already there. It’s just a matter of pulling the package together.