This content is part of the Essential Guide: Guide to using advanced analytics and AI in business applications

Deep learning algorithms demand nearly limitless supplies of data

More data almost always makes deep learning projects more effective. One data science manager says there's essentially no limit to the amount of data he wants to have access to for his team's projects.

In any deep learning project, it's almost impossible to imagine an upper limit on the amount of data needed for training models and conducting analyses.

"We need to get more data," said Patrick Lucey, director of data science at sports consulting company STATS LLC in Chicago. "We're really just scratching the surface. We want to reconstruct that story, [and] tell better stories, and we're limited because we can't get all the data we want."

Deep learning, as defined by the use of multiple machine learning algorithms, such as neural networks strung together, isn't necessarily a new concept. However, it started to gain more widespread traction last year, as researchers and enterprises realized that analytical models could be turned loose on the massive troves of data businesses had accumulated since the dawn of the big data era. Deep learning algorithms require experience to sharpen their recommendations, and big data provides them with exactly the fuel they need.

But this raises the question of when is enough data enough? Some of the most prominent deep learning examples used hundreds of thousands, even millions of records during the model training process. But, sometimes, even that isn't enough.

At STATS, Lucey has access to ample data, but said he still feels models could function better with more. The company maintains databases of game data going back to its beginnings in 1981. Its deepest data sets go back to 2010 with the NBA, and come from its SportVU system, a network of cameras installed at sports arenas that captures player movement data.

Deep learning algorithms require experience to sharpen their recommendations, and big data provides them with exactly the fuel they need.

This wealth of data has enabled Lucey and his team to do some interesting things with deep learning. For example, he and his team developed a model that looks at video data from NBA games and analyzes players' body positions to better define what an open shot looks like.

Another STATS project applied deep learning algorithms to English Premier League soccer. STATS analyzed data beyond traditional statistics, like shots and goals, to understand the factors that led to longshot Leicester City Football Club taking home the title in the league's 2015-2016 season, which ended last May.

The data science team at STATS primarily builds models in open source tools, such as the Google-created TensorFlow and scikit-learn, a library of machine learning models built in Python.

These projects have been successful, according to Lucey. However, he added that he's already looking to sharpen analyses, and he thinks more data will help.

Damian Lillard body position skeleton
SportVU creates a skeleton of players' body positions, turning video into structured data.

In addition to larger data volumes, more detailed information will be necessary, he noted. Deep learning algorithms thrive on detailed data as much as large amounts of data, and that will play an important role as these models continue to improve and describe the world more accurately.

"That's the key -- finding that context," Lucey said. "You can get a good prediction, but if it's washed over by context, it's not as valuable. You have to have the data."

Next Steps

Deep learning will be important, but perhaps not groundbreaking

Deep learning to make artificial intelligence more human-like

Deep learning plays an important role in pursuit of artificial intelligence

Dig Deeper on Advanced analytics software