Big data and data mining team up to help users hone a competitive edge

alphaspirit - Fotolia

Analytics teams give data science applications real scientific rigor

Data scientists at companies such as LinkedIn and Cisco are applying aspects of the scientific method to data mining and analysis initiatives to try to make sure they get valid results.

Data science may not be a formal scientific discipline, but analytics teams increasingly are treating it like one to help ensure that data science applications produce accurate and meaningful information.

For example, LinkedIn Corp.'s data science team works with product managers, application developers and other business users to define quantitative metrics for analyzing tests of planned new features on the social networking company's website. "Everything we do at LinkedIn is very metric-driven," said Yael Garten, its director of data science. She added that "hundreds of metrics" are in place, part of an analytics process designed to enable data-driven discussions about how features are faring in trial runs.

The process also includes elements meant to make sure the data scientists have valid data to mine and analyze, Garten said in a presentation at the 2017 TDWI Leadership Summit in Las Vegas. Tracking and logging data is part of job descriptions -- and performance reviews -- for developers, and executive approval is needed to launch new features without related data being logged. "We treat data as a first-class citizen," she said.

In addition, data scientists, product managers and developers jointly create data requirements and schemas, which a data model review committee then checks to see whether the specified data will be successfully generated, Garten said. And while feature tests are in progress, the data science team meets weekly with business executives and product teams to review metrics and analytics results.

LinkedIn, which is based in Mountain View, Calif., and was acquired by Microsoft in December, even uses scientific terms as part of the data science process. For example, Garten referred to the feature tests as experiments, and she said the metrics are used to test out hypotheses on how features will affect the activities of LinkedIn users on the site.

A scientific bent in data science work

Cisco is another company that's applying some scientific rigor to data science applications. Its corporate data science team has adopted a set of "open science" procedures, such as peer reviews of each other's work, said Anu Miller, a senior data scientist at Cisco.

The Cisco team also adheres to CRISP-DM, a data mining and analysis methodology formally known as the Cross-Industry Standard Process for Data Mining. CRISP-DM, which was first developed in the late 1990s, outlines a six-phase process model for data analysts to follow. "We use it to guide our projects all the way through," Miller said in another TDWI conference session. "We're almost religious about this."

In addition, Miller and her colleagues use a decision-modeling process for tying data analytics efforts to business decision-making that was created by James Taylor, CEO of consultancy Decision Management Solutions. The data science team measures the applications it's working on against Taylor's process to make sure there's a good reason to do the analytics work, Miller said. "We ask each other all the time, 'What business decision are you looking to support?' "

Applying aspects of the scientific method to analytics applications also helps to foster more teamwork among the data scientists, according to Miller. "Those things almost force you to be collaborative," she said. "There are no unicorns on our team. We have to work together."

Failure is an option for data analysts

Donald Farmer, who heads analytics at data management consultancy TreeHive Strategy in Woodinville, Wash., said effective data science applications also call for analytics teams to be willing to experiment -- and to fail in their experiments, just like real scientists often do.

"Innovation involves a lot of failure, and you need to embrace it," Farmer said. "If everything you do works, you're not daring enough -- and you're not really being innovative."

If everything you do works, you're not daring enough -- and you're not really being innovative.
Donald Farmerprincipal consultant at TreeHive Strategy

That point resonated with conference attendee Reuben Schooler, senior data engineering manager in the digital transformation group at Duke Energy Corp. in Charlotte, N.C. But, he said, getting agreement internally on a tolerance for analytics failure can be a hurdle, especially in an organization like his that's in the early stages of building a big data architecture to support data mining and data science applications.

"It's science -- the test tube blows up more often than not," Schooler said. "The trick is how to do that so it doesn't have any backlash on your operational systems."

Such issues are currently in play at Duke Energy, an electric utility and natural gas distributor that operates in six states. Schooler said the digital transformation unit was set up alongside the company's main IT department to push new ways of using technology, as part of an effort to become a more data-driven organization. In connection with that, a data science and analytics team was put in place "to guide priorities around what we want to do" in business operations, he added.

But exactly how the new analytics process will work is still a work in progress, according to Schooler. Some basic steps need to be taken first, he said -- for example, deploying Hadoop and other big data technologies, and getting data seen as a full-fledged corporate asset throughout the company.

Next Steps

Companies tap physicists and other real scientists to be data scientists

Jobs for data scientists are plentiful -- data scientists, not so much

Human judgment still has a place in data-driven analytics processes

Dig Deeper on Business intelligence data mining