Petya Petrova - Fotolia

Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Machine learning projects face data prep, model building hurdles

IT and analytics managers discuss the biggest challenges of machine learning applications, with data preparation and development of algorithm-driven analytical models sharing top billing.

Machine learning has been part of the advanced analytics picture for decades, but the emergence of big data platforms and better tools for creating automated analytical algorithms is bringing it more front and center. As a result, growing numbers of IT and analytics teams face the challenges of making machine learning projects work.

In many organizations, machine learning initiatives require big investments in IT infrastructure, often involving the deployment of Hadoop clusters, the Spark processing engine and other big data technologies. New data management and analytics processes are often also needed to get data sets ready for analysis and to develop the algorithms that will be run against them. In many cases, that means adding new skills through outside hiring or retraining of existing employees.

So-called deep learning applications, an emerging further step along the artificial intelligence curve, add to the machine learning challenges for organizations looking to run even more complex analytics jobs -- for example, interpreting images in order to classify them based on their content. In particular, deep learning ratchets up the development degree of difficulty for data scientists and statisticians building predictive models powered by automated algorithms.

To get some real-world insight into the hurdles that can trip up machine learning projects, we asked experienced attendees at the Hadoop Summit 2016 conference in San Jose, Calif., about the biggest challenges they've encountered. Their answers touched on the complexity of both upfront data preparation work and using libraries of machine learning algorithms as part of the model development process. Here's what they had to say, presented in verbatim form.

Chester Chen, senior manager of data science and engineering at wearable camera maker GoPro Inc.: "The biggest challenge is really preparing the data. All this data is coming in in different forms -- getting the proper data in the right data pipelines is a pretty daunting task."

Peter Crossley, CTO at web, mobile and internet of things analytics services provider Webtrends Inc.: "Getting data that's sanitized or managed in some form. You have to have a normalized data set -- you can dump all your data in a data lake, but then you have this marsh of data that can be hard to analyze."

Murali Kaundinya, innovation engineering director at pharmaceuticals maker Merck and Co.: "The [data analysts] don't want to deal with all the machine learning libraries. The big challenge is to present them with a platform they can use without becoming a machine learning expert."

Bryan Lari, director of institutional analytics at The University of Texas MD Anderson Cancer Center: "You're starting with imprecise data -- so, it's getting to a high enough level of precision in the data that you're confident you're getting accurate results."

Sumeet Singh, senior director of cloud and big data platforms at Yahoo Inc.: "We have to make it a whole lot simpler. [Data scientists] could easily spend a month or two just to evaluate a particular library before doing anything with it. That's an impediment."

Next Steps

Why machine learning projects and tools could be transformational for businesses

Data scientists tap deep learning techniques to dig into complex data sets

How machine learning software helps eBay translate its online auction listings

This was last published in October 2016

Dig Deeper on Predictive analytics

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

1 comment

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What's your take on the top challenges in machine learning applications and how to overcome them?
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close