Metadata, Business Rules & Semantics

Data quality can benefit from a unified theory of information intelligence.

This article originally appeared on the BeyeNETWORK.

Our industry has become extremely fragmented and nowhere is this more apparent than in a Business Intelligence environment.  We have experts in application development, who specialize in Object-Oriented Design and Development.  Then we have experts in database management, specializing in relational design, both logical and physical.  In addition, we have Data Warehouse specialists, and even within data warehousing we have business intelligence application developers, ETL developers and integration specialists; we also have metadata design and development experts.  Not to mention Database Administrators (DBA’s), System Administrators, and Data Administrators.  

What is needed is unification.  Bill Inmon published a great article several months ago about the Unified Theory of Metadata.  This article goes a step further and brings together several disciplines within Information Management to accomplish goals that both government entities and corporations deem important, such as Data Quality, and describes how a unified approach can enable businesses to be more responsive to their customers and the changing landscape of business.

Basic Problem in Information Management

I would put forth that in my 20+ years in data management; the number one problem today in turning data into something useful to the enterprise is the lack of understanding of that data, both what it means on a fundamental level, and the context behind it.  Let me cite a few examples that all of us are familiar with:  

  • Mars Climate Orbiter problem: the data was assumed to represent one unit of measure, but it was actually stored as another (miles vs. meters), hence millions of dollars of equipment was lost (see the following website for the article: http://mpfwww.jpl.nasa.gov/msp98/news/mco990930.html)
  • Bombing of the Chinese Embassy in 1999: The map was out of date. 

More on information management

Read this e-book to get a handle on BI data quality management

Find out how data errors can waylay a BI data integration strategy

Learn how a collaborative approach is needed for reliable BI data

Public corporations and government agencies face a yearly challenge of producing an annual report, which is filled with counts and statistics.  The assumptions around these counts can be misunderstood and easily misconstrued.  For example, consider the simple problem of counting how many customers you have.  What is the definition of a customer?  You can easily arrive at different counts based on different definitions of a customer.  For example, does a customer of a software company on maintenance of a previously purchased license count as a customer or do you only count new customers?  What about repeat customers?  Or do you define new terms which specify these various classifications?   

Extrapolate even further: business people using a BI (business intelligence) tool to make financial decisions to look at statistics in multidimensional analysis.  It is highly likely that at least one cube has hidden assumptions about the meanings of business terms, that he or she may not be aware of, or have different assumptions than the individual who constructed the cube.  Misunderstood data may unwittingly cause many errors in judgment, because the business terms used are not clarified and their definitions are not made readily available.

Semantics:  The Study of Meaning

The definitions, context, assumptions and rules surrounding business concepts are the semantics.  In information systems throughout the years, we have done a poor job at capturing these semantics.  (Remember how we all absolutely hated to do documentation?! Now it’s coming back to bite us!)

So, as a good consultant, when I realized this, I began to help my clients build systems that kept track of these semantics.  In my mind I saw two very important items: definitions and business rules.  I helped my clients build repositories that store both of them.  Usually these repositories were home-grown, because the traditional repository tools didn’t store business rules or any type of business metadata very well.

Taxonomies and Ontologies

Taxonomy and ontology are words that are being used increasingly alongside semantics. I am a relative newcomer in this area, so I will do my best to describe these concepts as I understand them. 

Taxonomy means a classification scheme, usually involving a hierarchy. An ontology is a classification scheme with more semantics added.  Usually an ontology has more complex business rules concerning relationships and navigation within the taxonomy.  Ontologies can be developed and standardized for specific industries.  For example, the medical field has many different ontologies.  Today, many industries are beginning to publish standard ontologies, and the worldwide web/search engine proponents are driving these efforts.  Standardized ontologies can make web searches easier.  The whole point of taxonomies and ontologies is to create common semantics.  Whenever a term is used, everyone will know instantly what it is.  A taxonomy is a step beyond a dictionary, because it goes beyond just definitions; it also provides more contextual information, such as rules. 

There are some new Commercial Off the Shelf (COTS) tools that are beginning to address these issues, such as Contivo. One major component of good codified semantics is a good dictionary.  Dave Hollander of Contivo stresses that a good definition takes a lot of work to create, and it should be “set complete”.  This hearkens back to set theory, where the set is well-defined and specifies set membership completely; i.e. it is very clear what constitutes membership in a set, and it specifies detailed criteria of membership.  Getting business people to specify good, “set-complete” definitions is difficult to do, because we have not trained ourselves to be precise and the language we use actually encourages ambiguity. 

Relationships

An ontology should include the relationships between the dictionary terms.  The relational model we use for database design is good for expressing relationships, but it often does not express certain semantics well, such as, exact cardinality (“up to and including three, but no more than three”.) The object theorists have pointed out that relational theory does not handle hierarchies well.  However, hierarchies incorporated in database designs seldom work because, in the real world things are often not as “cut and dried” as they seem in a database model.  Often real-world objects may change classifications, in other words, the hierarchy may be dynamic.  Most modeling methodologies do not handle dynamic hierarchies well.  In addition, an object may be a member of more than one hierarchy at the same time (multiple inheritance).  Multiple inheritance may be complex because there may be rules that govern this inheritance and the OO model may not be able to incorporate these rules adequately.

Business Rules and Semantics

Business Rules are a combination of both process and data.  They incorporate aspects of both and they are not exclusively one or the other.  Business rules govern processes and constrain data.  It makes sense that any ontology should contain business rules. 

Business rules should use well-defined terms as building blocks and should be constructed from these terms.  For example: 

  • A Customer is defined as…
  • A Premier Customer is defined as…
  • Only Premier Customers are permitted to buy on credit.

Business Rules and the Business User

Business rules should be accessible to the business user.  They should be able to look at the business rules at any time, and even modify them with permission.  However, in most Business Intelligence Environments, the rules are buried in code.  Therefore, a dedicated effort should be made to get them out of the code and into the architectural layer in which they can be stored centrally, but accessed locally, whenever necessary.  This is possible given today’s technology, but it is time-consuming.   

It may be an easier thing to do if you have the business case to migrate to a rules-engine solution.  The rules will be discovered in the analysis for the rules-based solution.  Based on the tool used and its level of usability for business people, rules can be stored in a user-friendly format. These formats could be decision tables, English-looking sentences, process flow diagrams, etc.  The next step is whether the tool allows other structures to be connected to the rules that contain contextual information.  For the purposes of this discussion, we are interested in both rules that can be connected to other information in the business intelligence environment (such as the approval body, when was this rule “turned on?” etc) and rules that actually govern running systems (not just documentation).  Business rules need to be managed as metadata.

Components of an Information System

In the traditional way of building systems, there are designers that build the GUI (graphical user interface) for an application, programmers (writing 3GL in their favorite language) and DBA’s who design and build the database.  Each of these functions is very different from one another.  Each has their own methodologies and best practices.  Sometimes the GUI designers double as programmers, writing the business logic in addition to creating the user interface such as the screens and/or reports.  The goal is to separate the business rules from the rest of the logic in the problem and isolate them to, allow business people to view and modify them.  Business rules can serve as decision-aids for the business.  They could then be referenced from any BI tool as read-only.   

Business rules should be able to trigger analysis as well as feed the analysis.  In this way, business rules can serve to facilitate the business feedback loop.  

How Business Rules and Good Definitions Steer Data Quality

Practical Application

Everybody has problems with Data Quality.  What is a good definition of Data Quality?  The specifics about Data Quality are different from enterprise to enterprise, because it is defined by the business.  Dirty data is any data that does not conform to business rules governing what the business needs to do its job.   

Business rules constrain the data and they must play a central role in Data Quality.  This is readily apparent.  However, what is not so apparent is the need for good definitions to support Data Quality.  How you define what the data element represents is also how you define data quality for that element.  It should also dictate the data quality metrics for that element. 

For example, what is meant by “Address”?  Most systems distinguish between bill-to and ship-to addresses.  But consider: 

  • Is the address current?
  • Does the address accept mail or is it a geographic location only?
  • Is it a legal address only (i.e. appearing on a contract) and has no bearing on current reality (i.e. mail should not be sent to this address)?
  • Is it seasonal?
  • Does it accept letters only and no FedEx overnight deliveries (like a P.O. Box)?
  • What about a subsidiary address? 

The enterprise first must define which addresses are important to the business and need to be stored.  Next, data elements need to be created for each address type and verified with the business. 

The Difference Between Creating and Using Data

There is often a disconnect between the business process that creates the data as opposed to the process that uses the data.  Each has different assumptions and contexts for the data.  This is perhaps where semantics can bridge the gap.  The figure below shows how my data quality methodology called the Cornerstones of Data Quality™ works: 

The Data Quality strategy needs to take into account the requirements of both the creation of and use of business processes.  It needs to integrate both of them. This will define and measure data quality for the specific element involved.

Conclusion

This article outlines some fundamental principles about information management: 

  • We need to unify several different disciplines within data management to obtain a comprehensive approach to solve real-world problems; and
  • Data Quality is an example of one business problem that can be shown to benefit greatly from a “Unified Theory of Information Intelligence.” 

The Unified Theory of Information Intelligence will go far in helping to reconcile the age-old conflict between operating and managing the business.  Bill Inmon was the first pioneer to point out that each of these areas required its own paradigm in building information systems to meet business needs.  Now we must go the next step and integrate these two worlds using a combination of disciplines from our own industry.

Dig deeper on Business intelligence architecture and integration

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

Close