ra2 studio - Fotolia
A growing crop of cloud platforms are targeting machine learning users with applications and services that can serve as an alternative to complicated and disparate open source tools. Some say there's cause for caution in evaluating these tools, however.
"We're seeing a really massive shift of big data into the public cloud," said Brian Hopkins, an analyst at Forrester Research. "Companies that jumped on Hadoop early are now facing upgrades; companies are challenged to maintain multiple versions of Spark. In the midst of all this expensive on-premises big data, you can get in the cloud in a week or so."
Open source tools are still at the vanguard of big data and advanced analytics. But they can be complicated to stand up, and stitching together an analytics infrastructure from open source data management and analytics tools can challenge any data team. This has led many enterprises to consider doing machine learning in the cloud, where infrastructure is generally managed by the vendor.
Cloud big data platforms are expected to grow in prominence. Forrester forecasts that cloud subscriptions will outpace new on-premises big data implementations by a factor of 7.5 over the next five years.
Big data software companies are now trying to poach some open source users with simplified tools that handle everything, including data ingestion, storage, machine learning model building and application deployment.
The appeal of machine learning in the cloud
This integrated environment is part of what brought AgroTools to the Google Cloud Platform. AgroTools is a Brazil-based supply chain intelligence company that tracks the state of agriculture production in Brazil and reports to clients like farmers, supermarkets and loan officers. To develop reports, the company assembles data from public sources like business filings and governmental agency reports, as well as proprietary sources like satellite imaging.
All the data lands in AgroTools' Google database, where it's structured and analyzed. A year ago, the company tracked information on about 300,000 farms. Today, that number is over one million. The data growth during this time was exponential, and Fernando Martins, the company's CEO, said the Google Cloud Platform helped them scale.
"A lot of medium-sized folks like us, it's not something we could have done ourselves," he said. "It's almost infinite scalability."
Scalability and the fact that the vendor maintains the actual infrastructure are among the biggest selling points of cloud big data platforms.
"If you look at most big data projects, most companies are kind of disappointed because they're spending most of their time and money just to maintain the systems," said Greg DeMichillie, director of product management for Google Cloud. "They're looking to get out of the business of taking care of machines."
This is becoming even more true as businesses look to analyze a greater diversity of data sources, including data from click streams and social media. Advances in image-recognition deep learning algorithms are opening the door for enterprises to analyze online images and videos, too. All of this is exploding already large data stores and making management a central issue for businesses.
"The data is growing so fast that you cannot just store it in your own data center," said Roman Stanek, CEO of data platform vendor GoodData. "You need to do it in the cloud."
Some cautions before buying
Of course, not everyone is sold on platforms for machine learning in the cloud. Trulia's vice president of engineering, Deep Varma, said he has evaluated the different offerings but has chosen to keep most of the company's machine learning work in-house. His team uses primarily open source tools, including Redis for a database, Kafka for data ingestion and Python for data analysis, to build and deploy an analytics infrastructure.
Varma said the decision of whether to build it yourself or outsource the work to a software vendor comes down to how important machine learning is to your company's operations. In the case of Trulia, Varma said using machine learning to understand its customers is the core of the real estate shopping marketplace's offering, so it's important for employees to have hands-on knowledge of how machine learning models work every step of the way. This makes troubleshooting easier and also allows for a more custom approach.
"If you believe something is core to your success, you better invest in that area," Varma said. "But if that's not your goal, don’t go and do it from scratch."
Even if an enterprise decides a machine learning platform is its best option, there are still a number of cautions when evaluating the various vendors. Forester's Hopkins said Amazon's platform has a lot of components that need to be configured, making it complex. Microsoft's platform is still geared more for application development than strictly machine learning. IBM is rolling out improvements to its cloud infrastructure but makes most of its money -- and therefore gives most of its attention -- to on-premises software.
But all of these vendors should improve their platforms, as they are all looking to own the market for cloud machine learning platforms. Hopkins expects to see a continuation of an arms race already well-underway. In order to lure businesses, and all their data, to their platform, vendors will offer low prices for storage and a lengthy list of machine learning services, such as curated pre-written algorithms.
"Cloud will commoditize," Hopkins said. "Storage is close to zero in cost, and infrastructure will go the same way. It will be all about services, and data analytics products will be the differentiator."
Not everyone needs machine learning in the cloud platforms
Google aims more cloud machine learning services at the enterprise
IT pros increasingly interested in cloud AI, machine learning services