I see two very different schools of thought that are both called master data management. I see one school of thought...
that thinks that master data management is the art of taking care of reference data across the corporation. There is reference data and it needs to have its own management for both operational, transaction processing and decision support system (DSS)/data warehouse/business intelligence (BI) processing. That’s one way of thinking about master data management.
The other way of thinking about master data management is in terms of subject area databases. These subject area databases include master customer files, master account files, master transaction files and the like. These master files contain what are known as the “golden records,” where data has the true system of record established and managed across the corporation.
So there are then two ways of thinking about master data – as reference data and as subject area data. These two forms of data are very different. There is very little overlap or commonality between these two forms of data. So there really are two very different subjects here, although they sound a lot alike from a superficial perspective.
From the standpoint of data management, there are very real different issues that relate to each form of master data management.
When it comes to managing master reference data, there is the issue of managing data over different environments. Unlike other data, master reference data must be managed in both the operational and the analytical environments simultaneously. This is not an impossible task, but it does require some thought. Master reference data can be duplicated, updates can be stored and assigned a place to be retransacted (always a dicey proposition), and master reference can even be shared in some technological environments. So there is some thought required to managing the access of master reference data across different environments.
In addition, there is the issue of archiving master reference data. Master reference data can be stored on a snapshot basis or on a changed data capture basis where all transactions against the master reference data are stored for the possibility that the data may need to be retransacted. Either option has its drawbacks. Storing reference data on a snapshot basis means that some transactions against the master data may be lost. For example, suppose that snapshots of master reference data are taken on January 1 and July 1. If a transaction is run against the master data on February 1, then another transaction is run against the same master data on March 15, the snapshot of master data will miss the values of the data from February 2 to March 14.
The opposite approach is to store all transactions that have run against master data and retransact the transactions if it is desired to bring the master reference data up to date. This approach does not miss any transactions that have occurred, but is complex in the best of circumstances.
The data management issues that relate to the management of master subject areas are quite different. When it comes to creating and managing data for master subject areas, there are (at the least!) the following issues:
- Is the design of the master data done correctly? Is the data at the proper level of granularity? Has a data model been constructed properly? Have relationships with other master data been properly established?
- Has the data been updated properly? Completely? Accurately? With the proper system of record?
- Is the master data timely?
- Is the master data able to be changed? When requirements change, is the master data also able to be changed?
- What relationship is there to master data and summarized and aggregated forms of the master data?
- How can the master data be accessed? Audited?
- What metadata is there for master data?
- Is there an audit trail describing the source of the master data?
- Are there times of the day when master data cannot be accessed?
In short, there are plenty of data management issues that surround master data when the master data is of the subject area genre.
So what’s the problem here? The problem is that when you have an article, conference or book on MDM, you need to specify exactly what kind of master data it is that you are considering. Otherwise, you get into some very confusing discussions.
But there is another problem. That problem is that for master data of the subject variety, there’s nothing new. In the first rendition of data warehousing, it was clearly stated that there needed to be subject areas. There needed to be carefully defined granular data for the subject area. There needed to be a carefully defined system of record. In short, there needed to be a foundation of subject area data – a bunch of golden records. So if that’s what you mean by MDM, then MDM is just a new rendition of something that has been around for a while. Perhaps subject area MDM is for those people that just couldn’t figure out how to make data warehousing a reality in their company.