This article originally appeared on the BeyeNETWORK.
There is a slippery slope on which one embarks when seeking the single source of truth. While my alliterative business intelligence (BI) philosophy may sound like a tacky cookie fortune, this is my attempt to introduce a Zen-like miasma to this article. In conversations with our clients, I have often been told that one of the organization’s justifications for building a data warehouse is to develop a single source of truth to feed other analytical applications. In the heady days of the late-1990s Internet boom, this kind of business case may have been enough to fund a data warehouse project. But in today’s comparatively more somber business environment, technology is less likely to drive a business decision to invest in information infrastructure. That doesn’t denigrate the value of the conceptual value of a single source of truth, but it should make one think a bit more about what that concept really means.
What is a single source of truth? Consider how application systems have historically evolved. Workgroup computing in the 1980s allowed for vertical application development, since each division within an organization would allocate budget for the purchase of a minicomputer for batch (and limited direct transactional) operations, and desktop computers with simple desktop applications for individual processes, such as customer management. In this environment, a loosely coupled, highly distributed and uncoordinated information infrastructure easily took root. In turn, this lack of synchrony introduced the kinds of consistency issues that plague enterprise information architectures today. Because each worker was wrapped up in his or her own information environment, one managed one’s own single source of truth.
Today, collaboration is a frequently used word whose overloaded meaning includes the integration of information across the enterprise. For analytical purposes, that implied collecting, aggregating, transforming and delivering data into a data warehouse, which then would feed business-oriented analytical applications. But I see a more generalized view taking shape that incorporates the concepts of collaboration for both analytical and for operational purposes. In this frame of reference we collate our data into different conceptual data sets: transactional, analytical and reference data, or what is now sometimes referred to as “master data.”
In my book on Data Quality, Enterprise Knowledge Management—The Data Quality Approach, I described how proactive data quality techniques coupled with metadata management would help in collecting what I then called “master reference data,” consisting of the base entities manipulated within the information environment (both operational and analytical), such as “vendors,” “products,” “customers,” “locations,” etc. What is now happening is that automated tools such as data profiling help in tracking down latent reference information that can be accumulated into a master data management scheme that would then contribute to a conceptual “single source of truth,” which then contributes to increasing the value of all of the information in the enterprise, and not just data aggregated for the sake of building a bunch of data marts.
Therein lies the more significant value, but also harbors the danger. One might be building a repository to capture the reference information, but if it is going to be used for more than just an ETL process, then there is a greater need for gaining consensus among all those data stewards or custodians whose applications are contributing and participating in the process. Yet within a specific application domain, the information requirements are limited to supporting the business drivers for that application domain. Collecting reference data into a single master repository means collecting the business information requirements also. We have gotten really good at developing universal data models to capture the data, but how good are we at managing both the way that data is being used as well as accumulating information requirements?
Let’s look at an example in the health insurance industry. A submitted claim is likely to be paid as long as most of the information about the situation is correct. So if the diagnostic code is missing, or is assigned some default coding for any reason, as long as a staff member participating in the claims process understands what was done and what service was provided, the provider will be paid. The same claims information is later used to analyze how often each illness is seen and is treated to help determine the statistically optimal treatment approach (e.g., a low-cost treatment that most frequently solves the problem most effectively). In that second use, it is much more critical that the diagnostic code both be there and be accurate, because skewing the analysis will skew the result, which might lead to less effective treatment recommendations. So in this example, the approach to master data management must incorporate the statement of the requirements as much as the data itself.
In the best of all possible worlds, requirements are stated in a formal manner that can be manipulated and published to the data stewardship community. In fact (not surprisingly), the formal statement of enterprise master information requirements is itself master data, which means that it can be managed as an enterprise asset as well. But since the data management industry has not spent 40 years developing models for formally representing requirements, this introduces a potential gap in the process.
Building a single source of truth is valuable when it contributes to the organization being better positioned to achieve its business objectives. Abstracting the tasks is reasonable for a development and implementation phase, but don’t do it before determining how it adds value either operationally or analytically. And when developing this approach, keep in mind that the aggregate collection of requirements is cumulative, and may introduce new constraints that might not have existed at all before the process was initiated.