This week, Seattle-based startup GraphLab introduced a new analytics platform that it says will help scale up machine learning techniques for enterprise-grade performance.
The company's founder and CEO, Carlos Guestrin, said the goal is to bring multiple data stores under one analytics roof to simplify big data. A July survey of more than 100 data scientists conducted by software vendor Paradigm4 found that 71% of data scientists feel data variety, not volume, makes their job difficult.
The new release, GraphLab Create, can be placed on top of many popular data stores, including Hadoop clusters, Amazon Redshift databases, NoSQL databases and table-based databases. It provides an application programming interface layer that includes some pre-built algorithms for things like recommendation engines, and allows users to write jobs in Python. The company will not release pricing information until later this year.
In some ways, GraphLab is going up against Apache Spark, a platform that similarly functions as a common interface for many different data stores. There are also similar tools available from Skytree. But Guestrin said Create is more stable and mature at this point than the open source Spark, and allows for more customization than products from Skytree, which come fully built.
The platform uses Python, as opposed to R or some other programming language, because it allows machine learning techniques to scale up to enterprise needs better, Guestrin said.
"The issue we're seeing is there's a lot of buzz around machine learning, but there's a big talent gap," Guestrin said. "Hiring someone like a data scientist has become really hard. What we're trying to do is address this problem."
Users have been calling for some kind of unified big data platform, and Create fits that mold. But it will have to fight it out against Spark and other players to claim the niche. Accessibility may play a role in who wins this fight. Create is certified on the Cloudera Hadoop distribution and comes packaged with the Pivotal Hadoop distribution. Currently, Spark is packed with those two distributions, as well as those from Hortonworks, IBM and MapR. GraphLab has around 100 beta testing customers, including Zillow, Pandora and ExxonMobile.
Machine learning to be big in 2014
Learn where greedy algorithms come into machine learning
Semi-structured data plays big role in machine learning at LinkedIn