This content is part of the Essential Guide: Guide to using advanced analytics and AI in business applications

Essential Guide

Browse Sections

IBM cracks the code for speeding up its deep learning platform

Using more GPUs when training deep learning models doesn't always deliver faster results, but new software from IBM shows it can be done.

GPUs are a natural fit for deep learning because they can crunch through large amounts of data quickly, which is important when training data-hungry models. But there's a catch.

Adding more graphics processing units (GPUs) to a deep learning platform doesn't necessarily lead to faster results. While individual GPUs process data quickly, they can be slow to communicate their computations to other GPUs, which has limited the degree to which users can take advantage of multiple servers to parallelize jobs and put a cap on the scalability of deep learning models.

IBM recently took on this problem to improve scalability in deep learning and wrote code for its deep learning platform to improve communication between GPUs.

"The rate at which [GPUs] update each other significantly affects your ability to scale deep learning," said Hillery Hunter, director of accelerated cognitive infrastructure at IBM. "We feel like deep learning has been held back because of these long wait times."

Hunter's team wrote new software and algorithms to optimize communication between GPUs spread across multiple servers. The team used the algorithm to train an image-recognition neural network on 7.5 million images from the ImageNet-22k data set in seven hours. This is a new speed record for training neural networks on the image data set, breaking the previous mark of 10 days, which was held by Microsoft, IBM said.

Hunter said it's essential to speed up training times in deep learning projects. Unlike virtually every other area of computing today, training deep learning models can take days, which might discourage more casual users.

"We feel it's necessary to bring the wait times down," Hunter said.

IBM is rolling out the new functionality in its PowerAI software, a deep learning platform that pulls together and configures popular open source machine learning software, including Caffe, Torch and Tensorflow. PowerAI is available on IBM's Power Systems line of servers.

But the main reason to take note of the news, according to Forrester analyst Mike Gualtieri, is the GPU optimization software might bring new functionality to existing tools -- namely Watson.

"I think the main significance of this is that IBM can bring deep learning to Watson," he said.

Watson currently has API connectors for users to do deep learning in specific areas, including translation, speech to text and text to speech. But its deep learning offerings are prescribed. By opening up Watson to open source deep learning platforms, its strength in answering natural-language queries could be applied to deeper questions.

Next Steps

Data prep can be a big hurdle in deep learning projects

Deep learning helps users make sense of advanced analytics

Embedded analytics stands to reap big rewards from deep learning

Dig Deeper on Advanced analytics software