Better Together: Hadoop and Your Enterprise Data Warehouse

You’re getting started with a big data analytics project on Hadoop and are impressed by the cost savings on storage compared with your data warehouse. You’ve read that TrueCar, a company that collects vast volumes of car price data for its online car-buying business, has cut its monthly data storage cost from $19/GB to $0.23/GB.¹ So you’re wondering, should you consider moving all your business intelligence efforts to Hadoop?

No. A data warehouse and Hadoop are both well-suited to different tasks and Hadoop should not be viewed as a replacement for your enterprise data warehouse. What’s more, Hadoop adopters report some challenges. Some say that Hadoop takes too much effort or is too slow for real-time analytics.

Hadoop’s MapReduce engine is optimized for batch processing thanks to its ability to distribute simple calculations. However, this method is not ideally suited to ad hoc, interactive real-time data discovery and advanced analytics. Advanced analytics requires the ability for each of the database nodes to communicate with each other to enable interactive processing, a capability lacking in MapReduce. And Hadoop requires a particular set of skills that’s different from those needed for a data warehouse and may be hard to find in the labor market.²

Since you will keep your data warehouse and deploy Hadoop alongside it, the best approach is to utilize both in complementary fashion. Your enterprise data warehouse should contain structured and curated data, while Hadoop should serve as a sandbox for experimenting with new types of data like Web logs, text, email and machine data. ³ When combined with traditional data types found in the enterprise data warehouse, these new data types can offer users new insights. Hadoop can also be used as a staging area for data to be cleansed and structured prior to populating the enterprise data warehouse. This allows the enterprise data warehouse to focus on the data that is highly valued by business users.

Once you adopt this hybrid approach, you may discover some important strategic insights. For example, if you’re in the retail business, you may combine consumer sentiment data from free-text product reviews and call center notes with structured data such as pricing and SKU numbers. The results could give you new knowledge of customer preferences, leading you to scrap products that are falling flat and add new wares for which customers are clamoring.

Big data expert Tom Davenport chronicles several game-changing discoveries in his book, “Big Data @ Work,” as he explains to John Farrelly.⁴ He found companies that augmented “small data” projects with Hadoop big data initiatives to achieve dramatic results. In one case, Monsanto, which already had plenty of information in the form of structured data about its seeds and plant hybrids, added big data information about climate and soil conditions and made the resulting intelligence available to farmers. Farmers obtained guidance as to what to plant, when to plant it, how much water to use, how many seeds to sow, the best time for herbicide and pesticide applications and when to harvest. Crop yields increased by 10% to 15%.

The shortcomings of Hadoop for real-time analytics can be overcome to a significant degree by the use of in-memory analytics or in-database analytics. This is exactly what SAS High-Performance Analytics (HPA) technology accomplishes. SAS HPA allows complex data exploration, model development and model deployment steps to be processed in-memory or distributed in parallel across a dedicated set of nodes.

Because data can be quickly pulled into memory, requests to run new scenarios or new analytical computations can be handled much faster and with better response times. This enables business users to make real-time decisions and to create more accurate models.

Also, because data is stored locally in Hadoop, it can be processed without having to move the data to a separate analytic platform.

In addition to flexibility with regard to data types, Hadoop is not constrained by other common database limitations, such as the number of columns in a single table. Advanced analytics software uses an analytics-based table that can consist of tens of thousands or even hundreds of thousands of columns. Because the number of variables can have significant impact on the accuracy of the results, Hadoop supports advanced analytics particularly well, because the data in it can be both wide and deep.

Why is this important? Think about an anti-money-laundering application, which a business analyst at a bank or financial services firm may use to spot patterns of illegal activity. By analyzing transaction patterns in real time, illegal activity may be discovered and stopped before large losses are incurred.

Both data warehouses and Hadoop will continue to evolve, and perhaps Hadoop may become a replacement for the enterprise data warehouse. Data warehouses may improve and offer better storage economics, lower latency, higher scalability and support for diverse data structures. But for now, there is a need for both Hadoop and an enterprise data warehouse. When used together, they each can enrich and derive value from the data contained in the other, giving you a strategic edge you could not get in any other way.

¹“Tom Davenport on Hadoop, Big Data, and the Internet of Things,” SAS, October 15, 2014
²“The Current State of Hadoop in the Enterprise,” International Institute for Analytics and SAS Institute, p. 5.
³ Ibid., footnote #2, p. 6.
⁴ Ibid., footnote #1, SAS

Shutterstock

Search Data Management

Data governance regulations that executives should know
Growing national and international regulatory compliance demands aim to protect consumer data. Organizations must adhere to ...
Snowflake broadens open-source embrace, ups Iceberg support
To make data more interoperable across systems and usable for AI, the vendor is participating in projects that address data ...
Data governance for AI requires a cross-functional approach
AI systems create risks that span data, security and model integrity. A cross-functional governance model distributes ownership ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

Box consolidates AI tools, adds 'super agent' for complex tasks
Box Enterprise Plus and Enterprise Advanced users can take new AI functionality for a spin.
How to create a digital signature in Adobe, Preview or Word
Users can add digital signatures in Adobe Acrobat, macOS Preview or Microsoft Word, but security features, PDF support and trust ...
6 types of AI content moderation and how they work
AI is reshaping content moderation across text, images, audio and video. Learn six moderation methods and why human review still ...

Search Oracle

Click-to-launch tools pull apps through Oracle Cloud Infrastructure marketplace
Oracle has made it easier for customers to choose and launch third-party software onto its cloud. Now, the question is whether ...
Willis develops app to put a personal touch back in voluntary benefits
Part two of a two-part article: Willis uses PeopleSoft 9.1 to bring back the personal feel to automated insurance selection for ...
Willis develops app for real-time voluntary benefit selection
Part one of a two-part article: Willis uses PeopleSoft 9.1 to create real-time automated insurance selection for voluntary ...

Search SAP

At TechEd, SAP continues to lay down the AI data foundation
New tools to speed up agentic AI development, open SAP platforms and provide access to data products were also touted as helping ...
SAP pitches role-based Joule assistants as ERP work partners
New AI-driven applications for supply chain, procurement and CX also shared the spotlight as SAP strives to portray its broad ...
There are '50 shades of clean core' for SAP customers
In this Q&A, Michael Lemashov and Denis Malov of JDC Group discuss the strategies for SAP customers to achieve a clean core and ...