Essential Guide

Managing Hadoop projects: What you need to know to succeed

A comprehensive collection of articles, videos and more, hand-picked by our editors

Handling the hoopla: When to use Hadoop, and when not to

Hadoop has become everyone's big data darling. But it can only do so much, and savvy businesses need to make sure it's a good fit for their needs.

This article can also be found in the Premium Editorial Download: Business Information: Putting big data technology in its place:

In the past few years, Hadoop has earned a lofty reputation as the go-to big data analytics engine. To many, it's synonymous with big data technology. But the open source distributed processing framework isn't the right answer to every big data problem, and companies looking to deploy it need to carefully evaluate when to use Hadoop -- and when to turn to something else.

For example, Hadoop has ample power for processing large amounts of unstructured or semi-structured data. But it isn't known for its speed in dealing with smaller data sets. That has limited its application at Metamarkets Group Inc., a San Francisco-based provider of real-time marketing analytics services for online advertisers.

Metamarkets CEO Michael Driscoll said the company uses Hadoop for large, distributed data processing tasks where time isn't a constraint. That includes running end-of-the-day reports to review daily transactions or scanning historical data dating back several months.

But when it comes to running the real-time analytics processes that are at the heart of what Metamarkets offers to its clients, Hadoop isn't involved. Driscoll said that's because it's optimized to run batch jobs that look at every file in a database. It comes down to a tradeoff: In order to make deep connections between data points, the technology sacrifices speed. "Using Hadoop is like having a pen pal," he said. "You write a letter and send it and get a response back. But it's very different than [instant messaging] or email."

There's so much hype around [Hadoop] now that people think it does pretty much anything.
Kelly Stirmandirector of product marketing, 10gen Inc.

Because of the time factor, Hadoop has limited value in online environments where fast performance is crucial, said Kelly Stirman, director of product marketing at 10gen Inc., developer of the MongoDB NoSQL database. For example, analytics-fueled online applications, such as product recommendation engines, rely on processing small amounts of information quickly. But Hadoop can't do that efficiently, according to Stirman.

No database replacement plan

Some businesses might be tempted to try scrapping their traditional data warehouses in favor of Hadoop clusters because technology costs are so much lower with the open source technology. But Carl Olofson, an analyst at market research company IDC, said that is an apples-and-oranges comparison.

Olofson said the relational databases that power most data warehouse are used to accommodating trickles of data that come in at a steady rate over a period of time, such as transaction records from day-to-day business processes. On the other hand, he added, Hadoop is best suited to processing vast stores of accumulated data.

And because Hadoop is typically used in large-scale projects that require clusters of servers and employees with specialized programming and data management skills, implementations can become expensive, even though the cost-per-unit of data may be lower than with relational databases. "When you start adding up all the costs involved, it's not as cheap as it seems," Olofson said.

Specialized development skills are needed because Hadoop uses the MapReduce software programming framework, which limited numbers of developers are familiar with. That can make it difficult to access data in Hadoop from SQL databases, according to Todd Goldman, vice president of enterprise data integration at software vendor Informatica Corp.

Various vendors have developed connector software that can help move data between Hadoop systems and relational databases. But Goldman thinks that for many organizations, too much work is needed to accommodate the open source technology. "It doesn't make sense to revamp your entire corporate data structure just for Hadoop," he said.

Helpful, not hype-full

One example of when to use Hadoop that Goldman cited is as a staging area and data integration platform for running extract, transform and load (ETL) functions. That may not be as exciting an application as all the hype over Hadoop seems to warrant, but Goldman said it particularly makes sense when an IT department needs to merge large files. In such cases, the processing power of Hadoop can come in handy.

Driscoll said Hadoop is good at handling ETL processes because it can split up the integration tasks among numerous servers in a cluster. He added that using Hadoop to integrate data and stage it for loading into a data warehouse or other database could help justify investments in the technology—getting its foot in the door for larger projects that take more advantage of Hadoop's scalability.

Of course, leading-edge Internet companies such as Google, Yahoo, Facebook and Amazon.com have been big Hadoop users for years. And new technologies aimed at eliminating some of Hadoop's limitations are becoming available. For example, several vendors have released tools designed to enable real-time analysis of Hadoop data. And a Hadoop 2.0 release that is in the works will make MapReduce an optional element and enable Hadoop systems to run other types of applications.

Ultimately, it's important for IT and business executives to cut through all the hype and understand for themselves where Hadoop could fit in their operations. Stirman said there's no doubt it's a powerful tool that can support many useful analytical functions. But it's still taking shape as a technology, he added.

"There's so much hype around it now that people think it does pretty much anything," Stirman said. "The reality is that it's a very complex piece of technology that is still raw and needs a lot of care and handling to make it do something worthwhile and valuable." 

About the author:
Ed Burns is site editor of SearchBusinessAnalytics. Email him at
eburns@techtarget.com and follow him on Twitter: @EdBurnsTT.

Next Steps

See how businesses are leveraging Hadoop clusters

Learn why some businesses are struggling to implement Hadoop

Read this Hadoop integration and implementation guide

This was last published in September 2013

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Managing Hadoop projects: What you need to know to succeed

Join the conversation

23 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Is your organization using Hadoop?
Cancel
To use it as centralized storage system of all of our data.
Cancel
It is an emerging technology and need more in site engineer
Cancel
want to know more about its application part and its drawbacks
Cancel
I am currently researching the usefulness or uselessness of Hadoop; it is currently on our roadmap for 4th quarter 2014
Cancel
As a public agency with 300 disparate databases for over 50 applications serving 12 departments, data warehousing is cost prohibitive and the staff and time needed to try to pull relevant data from those databases doesn't have a viable economic ROI.
Cancel
We use Emcien to find connections quickly and automatically in our data.
Cancel
Just wanted to know more about hadoop.
Cancel
Hadoop is complicated
Cancel
We are using Hadoop, however, the analytics needs for end-users were not scoped properly up front. So the situation is once it's in Hadoop, it's challenging to user it with existing analytics tool sets or existing analytics staff's data skills.
Cancel
like to try
Cancel
Big data is irrelevant to most (normal sized) organisations and overhyped beyond the point of irritation.
Cancel
we cannot use open source as we deal in govt data and that is very confidential.
Cancel
nice article
Cancel
We are getting to know about Hadoop and seeking for a practice case for the implementation.
Cancel
What is the minimum omount of data to be considered for Hadoop
Cancel
too small to need it
Cancel
We are planning to build a new data warehouse for analytics
Cancel
we need to learn hadoop
Cancel
Hype just doesn't cut the mustard.
Cancel
exploring all the feautures in open source hadoop
Cancel
Need Hands On Training
Cancel
Good article on pros/cons of hadoop.
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close