Better Together: Hadoop and Your Enterprise Data Warehouse

You’re getting started with a big data analytics project on Hadoop and are impressed by the cost savings on storage compared with your data warehouse. You’ve read that TrueCar, a company that collects vast volumes of car price data for its online car-buying business, has cut its monthly data storage cost from $19/GB to $0.23/GB.¹ So you’re wondering, should you consider moving all your business intelligence efforts to Hadoop?

No. A data warehouse and Hadoop are both well-suited to different tasks and Hadoop should not be viewed as a replacement for your enterprise data warehouse. What’s more, Hadoop adopters report some challenges. Some say that Hadoop takes too much effort or is too slow for real-time analytics.

Hadoop’s MapReduce engine is optimized for batch processing thanks to its ability to distribute simple calculations. However, this method is not ideally suited to ad hoc, interactive real-time data discovery and advanced analytics. Advanced analytics requires the ability for each of the database nodes to communicate with each other to enable interactive processing, a capability lacking in MapReduce. And Hadoop requires a particular set of skills that’s different from those needed for a data warehouse and may be hard to find in the labor market.²

Since you will keep your data warehouse and deploy Hadoop alongside it, the best approach is to utilize both in complementary fashion. Your enterprise data warehouse should contain structured and curated data, while Hadoop should serve as a sandbox for experimenting with new types of data like Web logs, text, email and machine data. ³ When combined with traditional data types found in the enterprise data warehouse, these new data types can offer users new insights. Hadoop can also be used as a staging area for data to be cleansed and structured prior to populating the enterprise data warehouse. This allows the enterprise data warehouse to focus on the data that is highly valued by business users.

Once you adopt this hybrid approach, you may discover some important strategic insights. For example, if you’re in the retail business, you may combine consumer sentiment data from free-text product reviews and call center notes with structured data such as pricing and SKU numbers. The results could give you new knowledge of customer preferences, leading you to scrap products that are falling flat and add new wares for which customers are clamoring.

Big data expert Tom Davenport chronicles several game-changing discoveries in his book, “Big Data @ Work,” as he explains to John Farrelly.⁴ He found companies that augmented “small data” projects with Hadoop big data initiatives to achieve dramatic results. In one case, Monsanto, which already had plenty of information in the form of structured data about its seeds and plant hybrids, added big data information about climate and soil conditions and made the resulting intelligence available to farmers. Farmers obtained guidance as to what to plant, when to plant it, how much water to use, how many seeds to sow, the best time for herbicide and pesticide applications and when to harvest. Crop yields increased by 10% to 15%.

The shortcomings of Hadoop for real-time analytics can be overcome to a significant degree by the use of in-memory analytics or in-database analytics. This is exactly what SAS High-Performance Analytics (HPA) technology accomplishes. SAS HPA allows complex data exploration, model development and model deployment steps to be processed in-memory or distributed in parallel across a dedicated set of nodes.

Because data can be quickly pulled into memory, requests to run new scenarios or new analytical computations can be handled much faster and with better response times. This enables business users to make real-time decisions and to create more accurate models.

Also, because data is stored locally in Hadoop, it can be processed without having to move the data to a separate analytic platform.

In addition to flexibility with regard to data types, Hadoop is not constrained by other common database limitations, such as the number of columns in a single table. Advanced analytics software uses an analytics-based table that can consist of tens of thousands or even hundreds of thousands of columns. Because the number of variables can have significant impact on the accuracy of the results, Hadoop supports advanced analytics particularly well, because the data in it can be both wide and deep.

Why is this important? Think about an anti-money-laundering application, which a business analyst at a bank or financial services firm may use to spot patterns of illegal activity. By analyzing transaction patterns in real time, illegal activity may be discovered and stopped before large losses are incurred.

Both data warehouses and Hadoop will continue to evolve, and perhaps Hadoop may become a replacement for the enterprise data warehouse. Data warehouses may improve and offer better storage economics, lower latency, higher scalability and support for diverse data structures. But for now, there is a need for both Hadoop and an enterprise data warehouse. When used together, they each can enrich and derive value from the data contained in the other, giving you a strategic edge you could not get in any other way.

¹“Tom Davenport on Hadoop, Big Data, and the Internet of Things,” SAS, October 15, 2014
²“The Current State of Hadoop in the Enterprise,” International Institute for Analytics and SAS Institute, p. 5.
³ Ibid., footnote #2, p. 6.
⁴ Ibid., footnote #1, SAS

Shutterstock

Search Data Management

PII masking a differentiator for Aerospike's NoSQL database
By adding native protection of personally identifiable information, the vendor is simplifying system administration while ...
Databricks launches PostgreSQL Lakebase to aid AI developers
Resulting from the $1B acquisition of Neon, the database built for AI workloads -- including separate compute and storage -- is ...
Pentaho update aids data integration, semantic modeling
The vendor's latest platform update aims to speed, simplify and better govern workloads to help customers build a trusted ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

Box releases Box Extract, its AI metadata agent
Line-of-business Box users can now tag contracts, reports and other commonly used docs with plain-language instructions, which an...
The top 6 content management trends in 2026
AI technology continues to shape the content management market. It underpins top trends in 2026, including generative AI, agentic...
12 content collaboration platforms for enterprises in 2026
When evaluating content collaboration platforms, business leaders have several options and must choose carefully to find one that...

Search Oracle

Click-to-launch tools pull apps through Oracle Cloud Infrastructure marketplace
Oracle has made it easier for customers to choose and launch third-party software onto its cloud. Now, the question is whether ...
Willis develops app to put a personal touch back in voluntary benefits
Part two of a two-part article: Willis uses PeopleSoft 9.1 to bring back the personal feel to automated insurance selection for ...
Willis develops app for real-time voluntary benefit selection
Part one of a two-part article: Willis uses PeopleSoft 9.1 to create real-time automated insurance selection for voluntary ...

Search SAP

At TechEd, SAP continues to lay down the AI data foundation
New tools to speed up agentic AI development, open SAP platforms and provide access to data products were also touted as helping ...
SAP pitches role-based Joule assistants as ERP work partners
New AI-driven applications for supply chain, procurement and CX also shared the spotlight as SAP strives to portray its broad ...
There are '50 shades of clean core' for SAP customers
In this Q&A, Michael Lemashov and Denis Malov of JDC Group discuss the strategies for SAP customers to achieve a clean core and ...