Petya Petrova - Fotolia
A spate of new tools is looking to turn Hadoop into more of a real-time analytics engine, challenging its reputation as lumbering giant and turning it into something more nimble.
Hadoop garnered a lot of attention in the last couple years for its ability to process huge volumes of varied data. The problem is its batch processing, as users have been increasingly vocal about how this processing can slow down data extraction for iterative analyses. Enter two new players, promising speed out of their Hadoop analytics tools.
Skytree puts machine learning on Hadoop clusters
This week, Skytree Inc., based in San Jose, Calif., released the latest version of its machine learning software, promising improved extract, transform and load (ETL) functions for unstructured data and beefed up security tools, which are intended to strengthen governance and position the software as more of an enterprise-grade system.
Skytree's machine learning software can operate on data from a variety of sources, but the company is pushing the product as a natural pairing for Hadoop. Robert Dutcher, vice president of marketing at Skytree, said the company took that tack because of the parallel processing of data in Hadoop. Even though Hadoop leverages batch processing, spreading jobs out across many nodes can actually deliver respectable speed, at least compared to running a job on a single node, Dutcher said. This enables programmers to iterate machine learning jobs.
The new software release focuses on doing machine learning on unstructured data. This was always possible with Skytree, but the company claims the update simplifies the preparation of unstructured data for analysis by further automating a process that assigns attributes to data points.
Included in the update is a new administrative console, which lets administrators assign privileges around what data sources can be accessed and how users can combine them.
Arcadia pushes BI into Hadoop
Meanwhile, Arcadia Data, Inc., a relatively new player in the BI space that until this summer was operating in stealth mode, announced its new BI-on-Hadoop tool designed to eliminate the intermediary systems between Hadoop and business users.
The company, based in San Mateo, Calif., accomplishes this goal through a system that learns users' queries over time and creates predefined forms within Hadoop for commonly queried data. Doing so allows users to reach into Hadoop quickly for these common queries, without having to go through a lengthy and complicated ETL process each time.
The tool presents a visual interface that allows business users to explore and visualize data. It comes with prebuilt functions that can perform customer segmentation, path-to-purchase analysis and more.
Even though Hadoop is traditionally a batch processing engine, which can have speed limitations, Arcadia executives said the speed of their tool compares favorably to the BI tools of their competitors -- such as Tableau Software and MicroStrategy Inc. -- because jobs run natively in Hadoop, without going through intermediary systems.
Pricing for the BI-on-Hadoop tool is done on an annual basis per node on which the software is installed.
Video: Learn why Hadoop adoption is still low
Lack of speed remains stumbling block for Hadoop
The right SQL-on-Hadoop tool is key to getting insights out
How one company is using a Hadoop analytics tool to map the sky