Yahoo, seeking to capitalize on the big buzz around "big data" and cloud computing, yesterday spun off a new business...
zeroing in on its development of the open source tools Apache Hadoop.
Hortonworks, formed with the venture capital firm Benchmark Capital, will focus on advancing the open source software’s current framework. Hadoop has grown increasingly popular because of its cloud computing technology and its ability to process and store large amounts of data coming from semi-structured and unstructured as well as structured sources. The new company will also offer support for businesses and vendors alike. Yahoo and Benchmark Capital have not disclosed how much each company invested in the new business.
A formal announcement is expected to come later today at the Hadoop Summit, being held in Santa Clara, Calif. One of the summit presenters, Eric Baldeschwieler, has already been named the chief executive officer of Hortonworks, according to a release that surfaced yesterday. Baldeschwieler, who is slated to talk about the past and the future of Hadoop at the summit, was formerly the vice president of software engineering for the Hadoop team at Yahoo. He will be joined at Hortonworks by about 25 others, including several Yahoo employees and core Hadoop architects and developers.
"We anticipate that within five years, more than half the world's data will be stored in Apache Hadoop,” Baldeschwieler said in a press release. “We've assembled a top-caliber team committed to the Apache open source community and with the technology and business expertise to deliver value to the big data market.”
Although an open source community of engineers and architects have had their hands in developing Hadoop, Yahoo has been its true pioneer and the largest contributor to the project, developing about 70% of the code. In return, Hadoop has been instrumental in managing Yahoo’s voluminous data, which runs on 42,000 servers and delivers content to nearly 700 million customers worldwide, according to the Hortonworks website.
“Apache Hadoop has been and will continue to be an important area of investment for Yahoo,” Jay Rossiter, senior vice president of Yahoo’s cloud platform group, said in a press release. “The creation of Hortonworks will enable Yahoo to leverage a commercial partnership in addition to our continued internal investment to accelerate the evolution of the technology and its use to power Yahoo’s business.”
Is it time for another look at Hadoop?
Yahoo's decision to spin off Hortonworks will add fuel to the emerging market for enterprise-grade Hadoop distributions, however, potential customers of the new company need to proceed cautiously, according to one industry analyst.
With Hortonworks, Yahoo has effectively become one of the biggest names in the Hadoop business, said James Kobielus, a senior data management analyst with Cambridge, Mass.-based Forrester Research Inc. But Yahoo also lacks the experience of a veteran commercial software vendor.
"Yahoo has been a Web 2.0 pure play from the start and now they're getting into the products business," Kobielus said. "Can Yahoo manage an actual software product group? They're unproven, so that remains to be seen."
End users can expect the market for enterprise-grade Hadoop to heat up in the coming months as new vendors like Hortonworks enter the market and established vendors like Cloudera, DataStax and MapR continue to develop and launch new products, Kobielus added.
In fact, just today Cloudera rolled out new tools for Cloudera Enterprise 3.5, which beefs up its data management offerings, and Cloudera SCM, claiming to provide easy installment and configuration for a complete Hadoop-based stack. MapR also announced new partnerships with several companies to help leverage big data analytics.
"We're going to see a glut of these kinds of vendors until there is the inevitable shakeout," he said. "For end users, this means that they need to take a renewed look at Hadoop for addressing problems that [are usually handled by] data warehouse and analytics vendors like Teradata and Oracle and IBM."
The competition to become the tool for big data management and analytics stretches beyond the world of Hadoop. Earlier this month, LexisNexis Risk Solutions said it will make its High Performance Computing Cluster (HPCC) open source and serve as a Hadoop competitor. Armando Escalante, the chief technology officer for LexisNexis, said the multiparallel batch processing engine known as Thor will run queries four times faster than Hadoop.
Kobielus expects Hortonworks to begin offering products or services within the next six to nine months.
"They're going to remain fully open source in that all development done by Hortonworks will be contributed back to the Apache Hadoop community," he said.