Temporal data reality: In BI, time is of the essence

To fully understand their analytics data, organizations need to incorporate temporal data capabilities into data warehouses and business intelligence systems, consultant Barry Devlin says.

In popular physics, time is the fourth dimension. In the world of business computing, it's more like an invisible presence. Business systems mostly operate in some sort of existential present tense, at least in the way that most relational databases represent time. But applications typically must have a fuller understanding of time built into them. Every application designer has programmed a unique set of timestamps, flags, and update and delete algorithms to address business questions as simple as this: "How do we recall yesterday's order details after the order is updated today?"

Such needs are so common that computer scientists invented the concept of temporal databases as far back as the 1980s, leading to the proposal of the TSQL2 extension to the SQL standards committee in 1994. In the absence of any agreement, Teradata finally implemented TSQL2 for data warehousing purposes in 2010. The formal adoption of a standard for (some) temporal data characteristics finally occurred as part of SQL:2011, and IBM implemented the temporal features in its DB2 database in 2012. Oracle offered "multitemporal" support in Oracle Database 12c the following year.

Both approaches are referred to as bitemporal because they define two time periods that can be automatically applied to every row in a database. While how they work differs slightly, in both cases we can say that one time period reflects changes in the things a business is interested in, while the second provides a chronological record of changes to the database itself. A key characteristic of bitemporal data is that data records are never overwritten or deleted -- they're simply marked as no longer current. This approach preserves a complete data history (with a few limited exceptions).

Business intelligence, with its emphasis on history, demands close attention to temporal concepts. Its raison d'être is to enable business users to explore past business events to understand problems, see trends, support decisions and so on. In many cases, users get point-in-time views of the business at defined times, such as the daily or month-end close. However, as business operations have become increasingly real-time in nature, the need for a continuous history that provides an ongoing, complete and accurate record of transactions and business performance has become paramount. Temporal -- and in particular, bitemporal -- data is thus central to effective BI processes and should be a core part of any data warehouse or data mart.

Time for a temporal workaround?

So, what did builders of data warehouses do before 2010? Or still do, if their implementation is on one of the relational databases lacking bitemporal support? There's a choice of three different methods -- though two of them might be living on borrowed time.

The first option is to implement timestamps and other temporal control fields within the database, and to provide appropriate templates for adding the required non-temporal SQL code. A broad bitemporal approach of this sort was used in IBM's internal business data warehouse starting in the late 1980s, and is described in my 1996 book Data Warehouse: From Architecture to Implementation. Used with care, this approach can provide a valid representation of the temporal aspects of historical data stored in normalized or partially denormalized relational databases. The care required is that developers and other SQL query writers must use the temporal templates in all queries and actions.

The second method uses snapshots. As the name implies, a snapshot is a point-in-time copy of data reflecting the business status at the moment it was captured. In traditional data warehousing, the moment in question is usually the close of the business day. Daily snapshots are then conceptually lined up one after the other to create a business history. In Building the Data Warehouse, first published in 1992, William H. Inmon discusses this approach, as well as some very simply described temporal timestamping, as being among the many possible data structures within warehouses. But snapshots are a terrible approach to temporal data, best confined to conceptual thought experiments rather than real data warehouses. They offer no guarantee of capturing all the changes in data; in fact, they actually guarantee that some changes will be missed.

The third option, introduced by Ralph Kimball in a 1996 paper, is called slowly changing dimensions. Perhaps the kindest thing that can be said for the SCD approach is that it is firmly founded in practice. Its structures and evolution show a distinct absence of any set of fundamental principles about the temporal nature of data. In his 2014 book Bitemporal Data: Theory and Practice, Tom Johnston demonstrates by way of a comprehensive example that much historical data is lost using this approach, while maintenance of SCD schemas can be very laborious and expensive.

Data complexity requires more understanding

Today's business environment is driving rapid growth in the complexity of both BI needs and the temporal characteristics of data. Decision making is increasingly real-time in nature. Predictive analytics extends our interest from the past and present into the future. Big data and the Internet of Things take us into a new world where the people who use data often don't control its structure or content. Even the bitemporal model, Johnston is reluctantly forced to admit, may turn out to be insufficient to carry all the temporal meaning that a user may wish to impart to or extract from business data. A tritemporal model, with three time axes, may eventually be required to make full sense of data.

And to quote from my own foreword to Johnston's book: "This understanding is vital if we are to navigate the expanding world of big data. I will go further. I believe that, without such an understanding, we are destined to crash and burn."

For those involved in the world of data warehousing, BI and analytics -- and especially those advocating a migration to some Hadoop-based data lake -- revisiting the theory and practice of temporal data should be seen as mandatory preparation for the onrushing data-driven future.

About the author:
Barry Devlin is among the foremost authorities in the world on business insight and data warehousing. His current interest is in the wider field of a fully integrated business, covering informational, operational and collaborative environments. He is the founder and principal of 9sight Consulting; email him at
[email protected].

Email us at [email protected], and follow us on Twitter: @BizAnalyticsTT.

Next Steps

More from Barry Devlin: Keep a place for intuition in data analysis and decision making

See why he thinks the much-ballyhooed data lake concept is all wet for managing big data

Get his take on why effective data management and governance are a must with big data

Dig Deeper on Business intelligence development