Forrester's Mike Gualtieri has a point of view on the role of a data scientist, the somewhat vague job title discussed in cutting edge computing shops, campuses and corporate board rooms. In his opinion, true data scientists are like other scientists -- they form hypotheses, and then they test them. Although, others may differ, he admitted.
"There are different definitions out there. Some people think a data scientist is someone who manages, collects and stores data. That is more in the computer scientist camp than in the analyst camp," said Gualtieri, an analyst covering predictive analysis issues at Forrester Research Inc. in Cambridge, Mass. While he puts emphasis on the need to find people who know how to collect the right information, his definition of a data scientist is perhaps more narrow than the most prevalent ones.
"A lot of people think a data scientist is someone who can do Hadoop. But there is another definition that I subscribe to a little bit more. That is of a data scientist as a scientist,” he continued. The scientist determines which data to work on. If someone isn't doing the analysis, finding the algorithms and testing them, then they are not a data scientist."
Understanding the role of the data scientist is important, but it is just one of the key elements data managers must consider when planning a program to capitalize on predictive analytics, the technology underlying the rising data scientist demand. Modern predictive analys is, coupled with large amounts of varied data, enables firms to reduce risk and create unique customer experiences, Gualtieri wrote in a recent report entitled, The Forrester Wave: Big Data Predictive Analytics Solutions, Q1 2013.
Although predictive analytics based on advanced machine-learning and other algorithms requires new tools and skills, it offers valuable benefits over traditional data mining for business intelligence, he said.
"Old style business intelligence is really about reporting -- cleverly slicing and dicing in different ways that may offer insight. But predictive analytics [is] very different," Gualtieri said. "Predictive analytics use many types of algorithms: regression algorithms, neural nets, various tree algorithms; it is the algorithms that determine what data matters and the probability of an outcome."
The algorithms essentially do the work, he said. There is precedence for such algorithms in artificial intelligence work going back to the 1980s, but easier availability of data, greater compute power and wider tool interoperability gives today's predictive analytics a special boost.
More for predictive pros
Look at Data analytics coverage
Try our recipes for real-time data analysis
Access our e-zine on BI for predictive prescripts
The data managers may need to trust their data scientists on some of this, and they need to gain understanding of important concepts (such as "predictability" and "probability") to make intelligent decisions with the new software. They need to set business goals in order to maximize success with predictive analytics programs, but will find that difficult if they do not understand predictability, he said.
"It's hard for companies that don't understand prediction. Many people think prediction is something absolute, but predictive analytics is not binary," Gualtieri said. "The result of a predictive model is always the probability of something happening."
He said that Netflix, certainly one of the standouts among new Web-based companies, is a good example of the use of this kind of probability. The Netflix online movie service employs predictive models, using a recommendation engine to suggest films. As users know, that recommendation engine can be a surprisingly accurate prognosticator -- despite the occasional clunker choice.
When Netflix customers are faced with a decision here, it is a simple action to accept or reject the recommendation. For business leaders trying to, for example, choose among suggestions for strategy improvement, it is tougher. "Humans have to determine what do with the information," Gualtieri said.