Sergej Khackimullin - Fotolia
- Craig Stedman, Editor at Large
Data scientists are adept at manipulating data and building analytical models to run against it. But they may not be well-versed in the meaning and relevance of business data, which is prompting data science teams to link up with business analysts and other workers who can help them better understand the data they analyze.
Closer ties to business data experts can also reduce the upfront data preparation work by data scientists. Data prep often eats up a large chunk of their time, but leaning on curated data sets created by the likes of business analysts frees up data scientists to focus more on their analytics duties -- and guarantees they're working with data germane to business operations.
That's in keeping with the main goal: ensuring that the advanced analytics work done by data science teams aligns with real business needs and issues.
"We always have business representation [in the analytics process], because there's no value in manipulating data for no valid reason," said Christian Anschuetz, chief digital officer at product safety testing and certification company UL LLC. "Our mission is not chasing data for data's sake."
Anschuetz is in charge of IT, analytics and other digital operations at UL, formerly known as Underwriters Laboratories. He said the Northbrook, Ill., company pairs up its data scientists with trained linguists who help build natural language processing algorithms to ingest unstructured data and prepare it for analysis as well as with its scientific leaders in areas such as regulatory compliance and chemicals to pinpoint analytics work on particular business problems.
"They work kind of as one, so to speak," Anschuetz said, adding that the collaboration brings together "a spectrum of skill sets," including expertise in both data science and traditional scientific methods. In addition to focusing analytics applications on pertinent issues, aggregating the different teams helps speed up the analytics process, he noted.
Not starting from scratch
Julia Silge, a data scientist at Stack Overflow, said she regularly reuses business logic and metrics that were built into data sets and analytics dashboards by workers in finance, sales and other departments at Stack Exchange, the New York company that operates the online community and careers site for software developers. Silge needs to connect different data sets together to do predictive modeling, and she said it would be "extremely onerous" to create all the business parameters herself from the raw data.
Much of the business logic on data definitions and attributes is also "very complex," particularly for finance data, Silge said. Working together with the finance team and others on the business side enables her "to make sure I'm using the same language on data as other people in the organization are," she explained. "I'm taking advantage of their knowledge and all the work they've put into the data."
Stack Overflow uses Looker Data Science Inc.'s data modeling and analytics platform to create dashboards and cleansed data sets that can be incorporated into analytics applications. For example, the data science team analyzes user engagement with the site's features and interactions between Stack Overflow's sales reps and the companies that pay to post job openings on the site. Silge said she builds analytical models primarily in R or Python, but the Looker software gives her an automated way to collaborate with business users; it also offers an API that lets her use R to set up tables for her analytics jobs in Looker.
Looker added R and Python connections and integration with machine learning platforms from Google and Big Squid to its software in May, along with other features designed to make it easier for data science teams to tap into and analyze data sets prepared by data engineers, business analysts and other workers. That came three months after Periscope Data, Looker's closest rival, similarly added support for R and Python to its software.
Data support for data scientists
In a session at April's Enterprise Data World 2018 (EDW) conference in San Diego, Susan Meyer, data strategy leader for the supply chain quality testing team at Bayer AG's Monsanto subsidiary, said data scientists "increasingly need support from business specialists to be successful." She recommended shifting the duties of business analysts "more toward the data needs of data scientists" -- even to the point of changing their job title to business data analyst.
Christian Anschuetzchief digital officer, UL LLC
At Monsanto, the responsibilities of such business data experts Meyer listed include identifying analytics opportunities, doing data discovery work and gathering business requirements for analytical models. "It focuses everybody on the business decisions being made, as opposed to just focusing on data sources and what's in them," she said.
Business analysts should be deeply entwined with -- or even embedded in -- data science teams to ensure that the collaboration between them "isn't a one-time conversation," Meyer added.
Also at the EDW conference, a manager of BI and data warehouse systems said his team includes data engineers and architects who work with his company's data analysts to help them identify and locate relevant data. The analysts are embedded in a business unit and use self-service BI tools, "but a lot of them don't know what data there is, where it is or how to find it," said the manager, who asked not to be identified.
It's a two-way street, though. Much of the data in the company's data warehouse is often captured in a basic way that doesn't fully meet business needs for analytics, the manager said. He added that the data analysts provide guidance on how to structure new data fields in a more business-specific way to better support analytics applications.
Data science teams have a growing arsenal of available advanced analytics tools to broaden their analysis work. "We can do things with machine learning that humans can't do on their own," UL's Anschuetz said. For example, data scientists at UL are now analyzing customer sentiment on the products it tests for companies. But without a solid business understanding of data, he warned, all that analytics firepower could be aimed at the wrong targets.
How Oracle's purchase of DataScience.com broadens the company's cloud efforts