This article originally appeared on the BeyeNETWORK.
There are three important “thought” evolutions of information management that drive both a desire and a need for effective management of information exchange. The first is the development of XML (and its multitudinous corollary offspring) to constitute a generic framework for implementation-independent exchanges of structured data objects. The second is the ever-improving network fabric that allows for rapid message interchange. Third is the growing interest in integrating collaboration (including both computation and data sharing) into both the enterprise and various extraprises. The nexus of these advances is a glaring need to standardize the way data is represented when it moves from one administrative domain to another.
There is no question that XML provides fertile ground to support the development of data standards. But in typical chicken/egg fashion, does the existence of something like XML dictate the need to define standards? In the early days of XML, that almost appeared to be the case. Rather, the need to define standards may rely on the use of XML, but among our clients there are actually five major aspects of standardizing information exchange that can be reviewed when developing a data standards strategy. Those aspects are:
- Content, or what exactly is being exchanged;
- Structure, or how data objects are packaged for exchange;
- Exchange, or what is the application process for exchanging data;
- Presentation, or what do data objects look like when published to various end-clients; and
- Management, or keeping track of the data standard in a way that is accessible to all constituents.
Using these focus areas as the starting point for discussions, we have found that those tasked with developing data standards have a better chance at both developing guidelines usable by the general stakeholder constituency as well as provide articulate arguments as to the value of a data standards program. In this article I will focus on the first four aspects, leaving the aspect of data standards management for a future discussion.
A side effect of working in a particular area for a long time is the gradual dulling of vocabulary semantics. In other words, as terms and phrases are subject to more frequent use by subject matter experts, their meanings become implicit, to the point where those phrases are understood based purely on context. The trouble with this is the potential for “meaning entropy,” where slight variations in meaning become ingrained within the consciousnesses of subject matter experts. This is not typically a problem until two organizations begin sharing information and the slight differences in meaning are not resolved. To address this, a back-to-basics approach of documenting clear definitions will help resolve differences.
This is where the aspect of content comes in. Before determining how data is to be exchanged, first enumerate those data elements that are relevant to the business purpose for which the data exchange is intended. Once that list is available, a concise declarative definition should be provided. In other words, provide a definition that is not described in terms of process or operations, but in terms of what business concept is being captured. The next step is to circulate those definitions among the stakeholder community that is participating in the data exchange; the intention is to reach consensus on the data element list and the corresponding definitions, as well as gain buy-in for participating in the standards definition process.
At this point, attention should be directed to determining the data types of the data elements, as well as the format representation during the data exchange. These details will provide some directives and drive requirements for developing the extraction, transformation and load (ETL) processes for the data values that are ultimately going to be incorporated into a data exchange.
Only after the stakeholders have agreed to list data elements and their corresponding definitions does an organization look to how the data elements are grouped together as part of an exchange. In XML, the concept of a document or message incorporates the fact that data objects are represented in a hierarchical fashion, and can be rolled up into a single bundle. Logically, this implies that data elements group together for one of two reasons—either because a set of data values attribute a single entity or there is a desire to exchange a group of similar objects together. For example, we might exchange customer information; customer name, address and identification number attributed to each customer, but we might be sending a set of customer objects together in one exchange.
An interesting aspect of structure definitions is the potential for increased abstraction. The hierarchical representation of data objects can be easily described using XML Schemas. The trick is to make sure that the tail is not wagging the dog—before starting any schema design, be aware that you are probably not the only person within the organization thinking about using XML, nor are you the only person who is interested in exchanging commonly used data objects. Take the time to review the structure with both business and technical people and leave open the potential to reduce overall complexity by allowing object attribution to stand in for potentially disjoint representations.
The previous aspects deal with what is being exchanged; this aspect deals with the actual exchange mechanics. This incorporates the questions of how data is to be packaged, whether the applications that are exchanging data will use a framework like XML, how the schemas are to be prepared and distributed, what kinds of networks to employ, security and authentication and the software framework to use in developing and deploying the interchange.
The issues that need to be explored center on infrastructure and capability. Despite the common perceived popularity of XML, many organizations still have the opportunity to take advantage of its use. There is some level of expertise and embedded application infrastructure that needs to be in place to integrate a Web services-style architecture with legacy production information management systems. Therefore, it is reasonable to take a gradual adoption approach by reviewing the current capabilities of each partner in the interchange to find a lowest common denominator for deployment, to enable as much participation as possible. However, realize that your exchange architecture should be forward-looking; design for the future, since most of your partners will eventually need to modernize their external-looking interfaces. Note that a well-designed XML schema will nicely accommodate this evolutionary approach, since the tagged elements in the structure may be acted upon or ignored based on the recipient’s capability.
The last aspect in this article deals with how the exchanged data is presented to the end-client at the end of the information chain. There may be significant differences between the way that a data element is formatted for an exchange and how it ultimately is displayed. For example, in many instances dollar amounts are represented in a numeric form, but not as decimal values, with the rightmost two digits presumed to be the amount in cents. The characterization of the value format for the exchanged document itself may be a 11-character numeric value, left-filled with zeros, with the rightmost two digits representing two decimal places. Yet when that value is displayed on a browser screen, the standard might specify that the filled zeros be removed, that the value be displayed with an explicit decimal place, in a specific font and size, with a $ positioned to the left of the 11th possible leftmost digit, and in the color black if the value is positive and in the color red if the value is negative.
Note that the end-client is not always an individual; presentation deals with how the data is forwarded to its ultimate destination, which may be a person, a document or perhaps another forwarding web service module. The determination of presentation should typically be deferred to the client application. The XSL Translation (XSLT) framework of XML allows for more than a single ultimate presentation representation of data, and allows for the determination to be made based on the end-application’s requirements.
More to Explore
In a future article, I will discuss the management aspect of the data standards process. Without the management component, there are still some issues that can derail the data standards process. But when one is at the beginning of a process that requires a close look at data standardization for information exchange, using these aspects as a starting point for developing a data standards program will help in exposing some of the more common barriers to a successful information exchange.