Machine learning algorithms and artificial intelligence tools are receiving a lot of attention in the analytics world these days, and industry experts and experienced users say the plaudits are well-deserved.
"These models are making a big difference, and if you're not considering how to use them in your product, you probably should," said Jeff Dean, a senior fellow at Google who helped lead development of TensorFlow, the company's open source machine learning platform.
Machine learning has come to play a central role in the majority of new products Google develops, Dean said in a presentation at Spark Summit 2016 in San Francisco. For example, it's at the core of training speech-recognition tools used in the Android mobile operating system. Machine learning technology also helped Google create a tool that automatically tags photos uploaded by users by examining what's happening in the photo. Other uses include training the Google Translate app, creating automated email responses in Gmail and training the AlphaGo system, which recently beat human champions at the game Go.
Machine learning for the masses
Machine learning and artificial intelligence (AI) may sound intimidating, but Dean said enterprises don't need the technical resources of a company like Google to get started. There are now lots of options that let businesses bring their own data to machine learning platforms that contain pretrained models or algorithms that organizations can train themselves. Google offers such a service, and the Spark data processing engine contains a library of machine learning algorithms. Such offerings lower the bar to entry.
Other speakers at the Spark conference agreed the time is ripe for machine learning applications across various vertical markets. "When I look at each industry, I feel like we have a decent chance of transforming most of them -- some in the short term, some in the long term," said Andrew Ng, chief scientist at China-based internet services company Baidu Inc.
Andrew Ngchief scientist at Baidu Inc.
In particular, Ng said machine learning and AI are likely to improve areas like web search, consumer financial services and fraud detection. He also predicted they could play an important role in predicting data center outages and recommending fixes.
The catch, Ng said, is businesses will need a lot of labeled, structured data. While machine learning in its various flavors, including deep learning, is good at making sense out of unstructured data, analytical models need to be trained on structured data. That gives them a frame of reference from which to make inferences.
Before taking the plunge into advanced machine learning, businesses need to make sure they have data sets of sufficient size and quality. But even with that caveat, Ng said it's likely to be worthwhile for most businesses to start a machine learning program. "There's a lot of hype about deep learning -- should you use it, should you not," he said. "It will create so much value for corporations and users. You can transform entire industries."
Machine learning as fraud-fighting tool
Capital One Financial Corp.'s technology team uses the library of machine learning algorithms that comes with Spark to analyze new account applications and score them for potential fraud risk. Spark lets the team quickly and easily combine data from multiple sources, allowing for deeper, more accurate predictions, said Chris D'Agostino, the McLean, Va., banking company's vice president of technology.
The analytical models include data from Capital One's own graph database and a Hadoop cluster, as well as third-party data from credit reporting agencies. The tech team uses Spark's stream processing module in conjunction with the MLlib algorithms to score new applications as they come in and continually train and improve the models for better accuracy.
It would be difficult to run that kind of machine learning job without Spark in the architecture, D'Agostino said, because it would mean trying to stitch together a variety of data stores and hand-coding machine learning algorithms -- a time- and resource-intensive project. "If you're trying to do this quickly and smartly, the more you can consolidate onto one platform, the better," he said.
Despite benefits, machine learning projects can go very wrong
Machine learning is not the same as stats and data mining
Machine learning isn't new, but the compute power behind it is