This article originally appeared on the BeyeNETWORK
Because we are in a world in which a growing number of business (and other) transactions occur virtually through the World Wide Web, it is easy to forget that although the transactions are performed is the “ethersphere,” the parties involved in most transactions are still situated in some physical location, and the completion of the transactions is also often dependent on actions that occur at specific, real-world coordinates. And the availability of mounds of reference data sets related to geography, its associated statistics and demographics, and the growing network of direct-to-individual information delivery mechanisms (via GPS-capable devices) positions geographic enhancement as one of the next big opportunities for business intelligence over the next few years.
Currently, there is a lot of geography data that is being used for both address scrubbing and for geocoding. Address scrubbing is not new – the process has been used for many years for cleansing addresses to improve deliverability of mail. Geocoding is also not new, although the process of assigning a latitude and longitude to a specific location is less prominent as a general rule in practice, mostly because of limited awareness of the value of geocoding for spatial analysis.
However, in those applications that employ spatial information in their business processes, there are certain expectations associated with the geographic services made available:
- Data Quality: There is an expectation of the existence of high quality geographical data used to enhance transaction data for the purposes of both operations and analysis.
- Geographic Services: Applications will need access to the services performed to either cleanse data (sometime referred to as “scrubbing”) that contains geographical references as well as append records with associated geographic data (demographics, customer lifestyle data, etc.).
- Geographic Data: The actual location-related data attributes associated with existing records.
When it comes to assessing needs for creating master data repositories appended with geographic data, these expectations drive different thought processes in determining the most appropriate master data engineering approach. Let's start with aspect 3: there are many types of data sets that have associated geographic data. For example, most residential customer data has a home address; utility companies require service addresses; the retail banking industry monitors property and loan data. But once geographic attributes are associated with a record, do the values of the geographic attributes become specific attributes of the object described in the record?
For example, the electrical power provider knows the square footage of a house located at a specific residential address. The power company may be interested in assessing usage per person in the household or average monthly consumption over a twelve-month period. Home square-footage data is available from public property sales records; yet when that information is associated with the service address record, that square footage becomes an attribute of the customer’s data, not just the location associated with that customer. So this situation opens a little can of worms: once data attribute values associated with location have been appended to an application record, there is no longer the relationship between the value and the original location. Therefore, if at some point the home owners expand their house and add more square footage, that information will be updated in the source geographic data set, but will not necessarily be updated in the electrical power provider data set.
This brings us back to aspect 1, which posits the existence of the high quality geographic data sets, and rephrases the master data question: Should an organization build a master set consisting of all the geographic data used to append or cleanse application records? In reality, these data sets are likely to be a component of the overall solution for doing the scrubbing or geocoding. Any geocoder and address-scrubbing tool should provide the base data for performing those processes; and if creating an internal repository of that base data would be duplicating a resource, that might be better managed as part of the acquired tool. In addition, consider that the scale of geographic data records for just United States’ addresses is probably orders of magnitude greater than the number of customer records, which implies a great investment in resources and management, so it is probably not a good idea to consider managing geographic reference data as an internal resource since the cost and effort overhead will probably exceed the derived value.
This points us to aspect 2, which looks at the kinds of services needed to support the business processes and how those services are performed by technical solutions and their implementations. There will always be differences in the level of precision expected by various applications for address scrubbing and for geocoding. As part of the requirements analysis process, one should explore what services each application owner expects to use, the levels of precision necessary, and how the different technical geographic data service solutions support those needs. However, as part of the data mastering program, determine the structural enterprise needs for capturing and maintaining geographic data that is appended to existing data sets. Then one can review the many ways and times that the geographic enhancement services are performed and whether there are opportunities to reduce or eliminate duplicate effort.
This allows us to reconsider aspect 3, but in a different way. Instead of looking at the static appending of geographic attributes to a record, we can think about the key geographic components of the record (or transaction, or analysis) at some point in time for a specific business purpose. Going back to our energy service example, the provider can query the public record to determine square footage on a monthly basis and log power use as a function of home size at each point in time.
So while the house is probably not going to move, the geographic reference attributes associated with the house will change from time to time, leading to inconsistencies, such as when the Post Office splits a ZIP code area, or when congressional district boundaries are redrawn. Analyzing historical trends may require the capture of a time-series of geographic appends over different time periods.
What does this all mean? It suggests that talking about mastering geographic data is more about seeking out the list of location or spatial-based data attributes along with their relationship to other master object types and the representational and time-based needs of the different applications for logging that data. Only then can the organization design an appropriate model for a master geographic data repository.
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of Master Data Management, Enterprise Knowledge Management – The Data Quality Approach and Business Intelligence – The Savvy Manager's Guide and is a frequent speaker on maximizing the value of information. David can be reached at firstname.lastname@example.org or at (301) 754-6350.
Editor's note: More David Loshin articles, resources, news and events are available in the David Loshin Expert Channel on the BeyeNETWORK. Be sure to visit today!