This article originally appeared on the BeyeNETWORK.
Customer data integration (CDI). Master data management (MDM). Enterprise information integration (EII). Data warehouse appliances. Customer data integration hubs. With the wave of new technologies and architectural options overwhelming IT practitioners, where does this leave your operational data store (ODS)? In this article, two industry experts weigh-in on the future of the operational data store from two different perspectives. Is the operational data store heaving its last breath, or is it a necessary component of any data warehousing architecture? Where does customer data integration fit? Review Jill and Joyce’s discussion, and decide for yourself.
Jill: So, Joyce, we’re having these new conversations with our clients around integration solutions. You’ve heard them too. Clients always seem to be questioning the elegance of their architectures and the value of their platforms.
Joyce: Yes, I know what you mean.
Jill: That’s why I thought this would be a great dialog to have. I mean, you contributed to the concept of the ODS, you teach a full-day course on it, and you’ve built a bunch of them. Can you talk about its inception?
Joyce: Are you calling me old? While I wasn’t the only one instrumental in the operational data store concept, I definitely built many stores of data that we called operational data stores. The ODS is a store of data that is updated, does not have history like a data warehouse, but is used to integrate data for tactical reporting purposes. Sometimes, this data includes customer, product and other integrated dimensions required by business. Also, the ODS has always been an integrated store of data that fed information back to source systems.
Jill: What did you find to be the main driver of those early ODS projects? I mean, back in the day, we saw that mainframes couldn’t support ad hoc reporting. A lot of them were totally consumed by operational processing. And the data warehouse – as Bill Inmon defined it – was subject-oriented, nonvolatile, etc. So the ODS offered "query-able" data faster and more frequently than the mainframe could.
Joyce: Are you making fun of my age again? Usually, the reporting is operational (tactical) in nature and requires integration that we couldn’t do with our silos of source systems. Due to the fact that integration of customer, product, cost centers and other financial dimensions has taken place in the ODS (based on business requirements), we sometimes use this store of data to feed other systems. One of those systems can be the data warehouse, and we have also used the ODS to feed integrated good quality dimensional data back to the source systems.
Jill: This puts the ODS on the same playing field as a lot of data warehouses. You’re saying that the ODS has complex data integration and multiple subject areas. That makes it sound like the only difference between an ODS and a data warehouse is the quantity of history. If that's the case, why don’t we put everything on the data warehouse?
Joyce: Most relational database management systems can’t balance queries with frequent updates, while in each release databases are getting faster and faster. The ODS data model differs from the data warehouse and has more tactical transactional data.
Jill: There’s some overlap in that paradigm with the new breed of CDI products on the market. Take CDI, which is basically the integration, standardization, reconciliation, and propagation of customer master data. CDI hubs were purpose-built to integrate data based on business rules and data quality using off-the-shelf interfaces so that data quality is “baked in” to the technology.
Joyce: I look at customer data integration and master data management as feeders or sources for the ODS and/or data warehouse – components in the architecture that help us with integration instead of writing a zillion extract, transform and load programs to accomplish the required tasks. And, MDM and CDI are integrators in the enterprise (for all sources).
Jill: I agree with that. If you take CDI, it represents the authoritative source of customer data for the enterprise. The CDI hub gets its data from all the operational systems that process customer data. So by that definition, it “feeds” other systems for sure. It’s about propagating the master data to other systems.
Joyce: I think the ODS and MDM can be very complementary. So, whether you call MDM part of your integration initiative, or the ODS or George, who cares?
Jill: I think that part of the issue is the original role of the ODS. It addressed the data reporting and provisioning limitations of data warehouses to provide access to data more quickly than the data warehouse could. Some of those limitations no longer exist, though.
Joyce: I disagree, I think there are still a lot of limitations with some database management systems, and adding customer data integration and master data management can be very confusing for a lot of people. The ODS was created because the databases for data warehouse could not balance queries with a small set of records and large million-row queries at the same time. Some ODSs were built to accomplish a specific process, such as global financial reconciliation for a corporation. This type of ODS resembles any source system I have ever built, and is usually a feeder for the data warehouse. There are still a lot of companies out there that do not understand data across the enterprise. It is still very difficult to integrate data (not just dimensional, but transactional) from many systems in real time. Any integration has a cost in processing time and resources. What you are not understanding is that we are talking transactions in the ODS, not dimensional customer data. So, if the CDI product can integrate all customer data (including transactions), then it should work well for most companies.
Jill: We find that most operational systems are now more “aware” of the need for reporting, so they’ve been built to support more frequent extracts. Plus, maturing ETL functionality has streamlined data migration. So, and this brings me back to some of our recent client conversations, we’re seeing that data warehouse architectures have evolved to support less-latent data than ever. This data is often just as available as data that might have been on the ODS; so you’ll see a company call its kludgy, tier-2, dirty-read system their ODS. You can’t just dump data off your operational system into a relational table and call it an ODS – but people do. You know one of the clients I’m talking about here.
Joyce: Yes, I know who you mean, and you are right. If I only need the data from one source system, that’s great; but if I still have siloed data stores, they’re not so easy to integrate. The ODS is integrated, volatile, subject-oriented, and as frequent as technology will allow. However, with the new MDM and CDI products, I see that integration happens faster. Remember, we can definitely compost in real-time; however, it is difficult to integrate multiple systems “properly” in real time. I am waiting for that software – it sure will make my life easier. For some of my clients, global integration of information is very difficult. In these instances, the ODS is a data store that integrates global financial, manufacturing, and HR data for consumption by the enterprise – enterprise meaning the data warehouse or any other application system. So the ODS is evolving in their enterprise and continues to be a viable part of their business intelligence architecture. I think the definition of the ODS is evolving too, some of us call it the “new generation ODS.”
Jill: I think this is the crux of all the confusion – the original definition has changed.
Joyce: Yes, the ODS definition has changed over time. MDM and CDI can be part of the overall architecture, and those “hubs” or “engines” can help us with the integration of good quality data, but the ODS (in some companies) is still required to store and integrate the data until month end/quarter end reporting has been completed.
Jill: That’s a clear distinction between CDI and the ODS. There are significant limitations in the areas of conflict resolution and data reconciliation that can’t be addressed by ETL. CDI can apply sophisticated logic (heuristic or probabilistic algorithms) to determine the best data across numerous data sources. But it doesn’t support operational reporting like an ODS would.
Joyce: True, the integration in the ODS from multiple sources has never been easy to program, nor has it now changed. The ODS is not used for just reporting, but integration of transactional data. A transaction contains only a few elements of reference data.
Jill: Yes, but transaction data is an event in time. So with transaction data, it’s not about integration as much as it is about transformation. But I get your point that the ODS stores transaction data too. And that’s not what MDM is about.
Joyce: Right. So why do you think some of your clients are questioning their platforms?
Jill: It seems like the ODS has morphed into something way beyond what it was intended to be. Returning to our ODS client example – whether or not it was an ODS is sort of beside the point: the system wasn’t useful. That client is considering “sunsetting” their ODS so that they can recoup the money that had been funding the platform and, especially, the maintenance that had become labor-intensive. Our clients are always reevaluating their platforms, since every platform requires people to support and maintain it.
Joyce: I think all systems should be assessed periodically. Once a year works for me. Perhaps an assessment would have shown that the ODS was not created based on business rules, was not flexible for change, and could not morph into needed store of data for the future. Hmmmm … do you think they were composting data?
Jill: I don’t think these guys were an exception. I think a lot of companies do this type of thing and call it an ODS. But you should define what you mean by composting data.
Joyce: Sure, it’s when people copy data from the sources into a store they call the ODS without regard to integration. So, basically, you could still have three tables with customer data from three different sources in what they call an ODS. The tables are exact replicas of the sources. Usually, these tables are truncated and reloaded on a daily basis. Sounds more like staging to me!
Jill: There’s a really good Confucius quote that says, “He who learns but does not think is lost. He who thinks but does not learn is in great danger.” So, Joyce, I think we both learned a few things. We might still disagree on some of the finer points, but are we still friends?
Joyce: Yep! And for those who define the ODS as a staging area – that definition of the ODS is dead, and always has been! However, the evolving definition of the ODS is alive and well, and continuing to change for many companies. The ODS has always been a place of integration, and will probably continue to be a component in many architectures. When corporations have a need for fast integrated master data (including customer), they will be looking at other alternative solutions. By the way – loved the new book!
Jill: Thanks Joyce. And I love that we can have a healthy dialog going about this stuff. Hopefully our clients can too!
Jill is a partner co-founder of Baseline Consulting, a technology and management consulting firm specializing in data integration and business analytics. Jill is the author of three acclaimed business books, the latest of which is Customer Data Integration: Reaching a Single Version of the Truth, co-authored with Evan Levy. Her blog, Inside the Biz, focuses on the business value of IT.
Editor's Note: More articles and resources are available in Jill's BeyeNETWORK Expert Channel. Be sure to visit today!
Joyce is the president of DBTech Solutions, and a data warehouse industry veteran. She specializes in metrics-driven tool selection involving information deployment technologies including ETL, data profiling, data quality, databases, and metadata products. Besides her data warehousing and operational data store experience, she has developed a wide variety of business intelligence and analytics applications. She is co-author – with Bill Inmon, Dan Meers, and Bob Terdeman – of Data Warehousing for e-Business, published by John Wiley & Sons.