Get started Bring yourself up to speed with our introductory content.

Machine learning models require DevOps-style workflows

Big data is driving the use of AI and machine learning. But teams must be swift to embrace DevOps and re-evaluate models, according to Wikibon's James Kobielus.

As 2018 gets rolling, it appears that various aspects of big data are morphing into machine learning and AI. The changes that machine learning models bring to big data analytics are not readily apparent.

To sort through recent developments, including data science and DevOps, reporter Jack Vaughan caught up with James Kobielus, lead analyst for AI, data science, deep learning and application development, at SiliconAngle/Wikibon. He had just finished a serious round of predicting 2018 with colleagues when we knocked on the door.

AI was asleep for a few years. Was it just waiting for big data to come along?

James Kobielus: Well, AI has been around a while. And, very much, it had been rule-based expert systems at the core of it. That meant fixed rules that had to be written by some subject matter experts.

What's happened in the last 10 years is that AI in the broad sense -- both in research and in the commercialization of the technology -- has shifted away from fixed, declarative, rule-based systems toward statistical, probabilistic, data-driven systems.

That is what machine learning models are about. Machine learning is the core of modern AI. It's all about using algorithms to infer correlations and patterns in data sets. That's for doing things like predictive analysis, speech recognition and so forth.

Much of the excitement more recently has been from neural networks -- statistical algorithms that in many ways are built to emulate the neural interconnections in our brains. Those too have been around since the 1950s, with a research focus.

photo of James Kobielus, analyst, WikibonJames Kobielus

In the last 10 years, [neural networks] have become much more powerful. One of the things that has made them much more powerful is there is much more data.

Much of that is unstructured data coming from the real world, meaning things like social media, for customer sentiment. That has come about as things like Facebook, LinkedIn and Twitter have become parts of our life. And there is value in being able to get inside your customer's head.

The frontier of that is deep learning; it's machine learning with more processing layers, more neural layers, able to infer higher level abstractions of the data.

Machine learning is exciting. At the same time, something could go wrong. What challenges will data analytics managers face when moving to these new technologies?

Kobielus: First of all, the fact is that this is tough stuff. It is complex stuff to develop and to get right. Any organization needs a group of developers who have mastered the tools and the skills of data science.

Data scientists are the ones that build, train and test these models against actual data -- that is, to determine if a model predicts what it is supposed to predict. It's not enough to build the algorithms; you have to train them to make sure they are fit for the purpose for which they have been built. And training is tough work.

You have to prepare the data -- that's no easy feat. Three-quarters of the effort in building out AI involves acquiring and preparing the data to do the training and so forth. The data sets are huge, and they run on distributed clusters. Often, Hadoop and NoSQL are involved. It costs money to deploy all that.

Conceivably, you might outsource much of this infrastructure to your cloud provider. Be it [Amazon Web Services], Microsoft Azure, IBM Cloud or whatever it may be. Once again, it is not cheap. Clearly, you need senior management buy-in to get the budget to hire the people and to acquire the technology to do this.

And these are not types of projects that get done, and that is it -- machine learning models have to be regularly revisited, right? And, isn't that where DevOps is coming into greater play?

Kobielus: Yes, you have to keep re-evaluating and retraining the AI models you have deployed. Just because you have built and trained them once, and they've worked at predicting the phenomenon you are looking at, doesn't mean they are going to work forever.

[The specter of model decay calls for] an orientation toward AI in a DevOps workflow.
James Kobielusanalyst, Wikibon

You encounter what is called model decay -- it's been experienced by data scientists forever. Models become less predictive over time. That's simply because the world changes. The model behind predicting an item a customer may have clicked on three years ago in your e-commerce portal may not be as predictive anymore. There may be other variables predictive of response rate. So you end up retraining and redeploying.

And that demands an orientation toward AI in a DevOps workflow. To do all that is not trivial. That is, you need to create a workflow that is very operational. It means always being sure you have the best training data and the best-fit AI and machine learning models.

This was last published in January 2018

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Guide to using advanced analytics and AI in business applications

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What challenges do you see in implementing machine learning and AI in your organization?
Cancel
The other important number besides 80% (80% of DS effort is in data prep) is 50% => where 50% of fully validated models never get used by business because they don't trust them or they are not delivered quickly enough.  DevOps discipline is sorely needed if we are to fully operationalize the power of data science and AI.
Cancel
Inflexibility, poor quality, and other obstacles hinder the successful production of analytics for data-driven organizations.  Other types of organizations have faced similar challenges and the lessons learned in these other domains can be applied in data analytics.  In software development, both Agile Development and DevOps have led to a major transformation in the speed and quality of code creation.  In manufacturing, statistical process controls (SPC) assure quality and provide early feedback on non-conformances.  Applying these methods to data analytics is called DataOps. DataOps is a combination of tools and process improvements that enable rapid-response data analytics at a high level of quality.  DataOps adapts more easily to user requirements, even as they evolve, and ultimately supports improved data-driven decision-making. DataOps is needed for the entire analytics pipeline: data engineering, data science, and visualization.
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close