Big data vendors should stop dissing data warehouse systems

Wayne Eckerson examines the analytics roles of data warehouses and big data systems and says he's tired of data warehouse bashing by big data vendors.

I've heard so many big data vendors bash data warehouses as a way to justify their new technologies that it's getting annoying. To them, data warehouse systems are monolithic, costly and inflexible, while their technologies are fast, flexible and affordable. "Buy our products," they shout in their shiny collateral, "and we'll save you from data warehousing hell."

As if technology were the problem. Or the solution.

I'll admit there are plenty of data warehousing failures out there. Designing a data warehouse is not easy, and implementing one is even harder. The critics are right -- data warehouses take a long time to build, cost a lot of money and are hard to change. But that doesn't mean we should ditch them.

BI experts panel logo

At its heart, a data warehouse is not a technology or tool. It is primarily a business process that unites an organization in electronic form (i.e., through data) so it can function as a single entity, not a conglomeration of loosely coupled fiefdoms. Without a data warehouse, business executives run blind, making critical decisions with inaccurate data or no data at all.

Although you need technology to implement data warehouses, technology can't harmonize business perceptions and deliver an enterprise view of an organization. Only business people can do that. In fact, getting business people to agree on the definitions of core business entities can be more challenging and time-consuming than creating the technical infrastructure. Instead of blaming technology or technologists for poorly designed or under-performing data warehouses, we should point the finger at executives who fail to provide sufficient leadership, vision and patience to create a common data vocabulary for doing business.

Data warehouse systems supply clean data

Big data vendors need to specify how they plan to deliver enterprise views and standard reports.

Technically, a data warehouse is a repository of clean, integrated and semantically unified data gleaned from major applications and systems in an organization. You can implement a data warehouse with a variety of technologies and tools, from relational databases to master data management hubs and Hadoop. Some technologies are better than others, and no technology is sufficient in and of itself. But that isn't the point. A data warehouse is really an abstraction, a logical representation of clean, vetted data that executives can use to make decisions.

Unfortunately, many in the big data community seem to advocate abandoning data warehouses altogether. Perhaps what they really mean is that they no longer want to use traditional relational databases and business intelligence tools to store and query business data. That's fine -- and welcome. New technologies bring benefits. But that doesn't eliminate the need for clean, integrated and certified data.

Big data vendors need to specify how they plan to deliver enterprise views and standard reports. Unfortunately, most ignore this annoying requirement or make it a small droplet in their big data lake.

The three pillars of an analytical ecosystem

Part of the problem is that the big data community inflates the role of a data warehouse before shooting it dead. The data warehouse is only one of several repositories in a mature analytical ecosystem, which also includes exploration/discovery and event-driven alerting environments (see Figure 1).

mature analytical ecosystemFigure 1: Conceptual architecture depicts a mature analytical ecosystem

 

Simply put, the job of a data warehouse is to help business people monitor existing processes and activities and identify key trends and anomalies; it underpins a reporting and analysis environment that is designed to provide answers to predefined questions. Although a data warehouse supports some degree of analysis, it's not intended to answer new and unanticipated questions. That is the job of the exploration and discovery environment -- the hallmark of the big data movement today. It lets power users mash up new and existing data sets, run complex queries and apply machine learning algorithms to drive new insights. The alerting environment, meanwhile, handles event-driven data feeds from high-volume transactions or real-time processing systems and alerts users or downstream systems when data triggers predefined rules.

More expert insight from Wayne Eckerson

Learn why it's time to consider cloud-based BI systems

Find out why you need more than statisticians to develop effective analytical models

See what it takes to be a BI leader -- it's not just technology

Missing from Figure 1 is technology. As I mentioned above, you can implement data warehouse systems (and the other environments) using a variety of technologies and tools. Your choices depend partly on your organization's legacy systems, budget and tolerance for risk. But whatever you decide to use, make sure you understand how it all needs to fit together in a well-designed analytical ecosystem.

Finally, let's not allow big data advocates to denigrate the data warehouse. It plays a vital role in any analytical ecosystem. A data warehouse is the vehicle that delivers an enterprise view of data and drives standard reports and analyses. And who can live without that?

About the author:
Wayne Eckerson is principal consultant at Eckerson Group, which helps business leaders use data and technology to drive better insights and actions. His team provides information and advice on business intelligence, analytics, performance management, data governance, data warehousing and big data. Email him at wayne@eckerson.com.

Email us at editor@searchbusinessanalytics.com, and follow us on Twitter: @BizAnalyticsTT.

This was last published in April 2014

Dig Deeper on Big data analytics

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

4 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Good article. I came to Hadoop after 15 years of building Data Warehouses with the major ETL tools. The business need for EDW is just as important as ever. Having Hadoop and other Big Data tools can make the job easier, cheaper, and more performant.
Cancel
It is important to acknowledge that if companies have a successful data warehouse environment, then it need to be leveraged. But it is not necessary always to have a data warehouse to provide reporting/analytics. In the past and now more so with new technologies, there are better alternatives to accomplish the business objectives with less risk, less cost and better chance of business adoption..
Cancel
What Big Data vendors are bashing Data Warehouses?
Cancel
Comparing Hadoop to a data warehouse is like comparing apples to oranges. The data warehouse is a concept and Hadoop is a technology. You can’t compare the two. As a concept of clean and integrated data, the data warehouse is here to stay. In the past, data warehouses have typically run on relational databases. However, over the last couple of years various limitations of the RDBMS have emerged (exploding license costs in the face of growing data volumes, poor fit for purpose for querying graphs and hierarchies and ingesting unstructured data types etc.). At the same time MPP SQL query engines on Hadoop have appeared such as Apache Drill that now make it possible to query data that sits on Hadoop. I would argue that there are clear signs now that the golden era of the RDBMS and data warehouse appliances is coming to an end. I have written a whole series of posts on the subject if you are interested in all of the details. Data Warehousing in the age of big data. The end of an era? http://sonra.io/data-warehousing-in-the-age-of-big-data-the-end-of-an-era/

Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close