This article originally appeared on the BeyeNETWORK.
Service level agreements (SLAs) have existed for many years within the IT world. But what about SLAs as they relate to data and its quality? Vendor-supplied
More on enterprise data quality
Read about the need for improved business data quality measures
Get David Loshin's five steps to an improved data quality strategy
Learn how data quality training helps with business user involvement
Case in point: A top five bank has more than 2,000 contracts with external vendors involving data, but has SLAs for data with only two of the suppliers!
This bank is not alone. For most institutions, this situation is the norm and not the exception. Hence the focus of this article: How should an organization think about SLAs covering data and its quality? Can such a concept be implemented with both internal as well as external entities? Tough questions, but they are relevant as more and more emphasis is placed on data and more and more data is provided by or handled outside of the institution’s control.
What is needed by the top five bank mentioned earlier (as well as by other institutions) is a brand-new type of service level agreement, one focused on the quality of the data provided back to the bank – the service level agreement for data (SLA4D). The SLA4D amounts to a guarantee of a degree of data quality. Once in place, institutions need to monitor the performance of these SLA4Ds to ensure compliance.
Where to Start – SLA4D Ingredients
An SLA4D is a contract between the data supplier and the data purchaser or data consumer guaranteeing delivery of a degree or level of data quality. The SLA4D can be a standalone contract or an appendix to the general terms and conditions of a purchase agreement. The SLA4D should, at a minimum, contain:
- Data quality metrics
- Expected data formats and delivery packaging
- Provider quality certification
- An agreed-upon method for adding to and/or modifying the SLA4D
Data Quality Metrics
A primary ingredient of the SLA4D is measurable data quality metrics. If the parameter can’t be measured, it cannot be monitored or controlled and is, therefore, of little use as a condition or term in an SLA4D. An earlier article, The Partnership of Six Sigma and Data Certification, presented several candidate data quality metrics including:
Accuracy and precision. Accuracy refers to how closely the data value agrees with the correct or “true” value. Precision is defined as the ability of a measurement or analytical results to be consistently reproduced and/or the number of significant digits to which a value has been measured or calculated. One can be extremely precise and, at the same time, be totally inaccurate.
Take, for example, the case of an oil filter manufacturer that produces millions of filters annually. It can accurately calculate revenues by capturing the sales price for each filter to two significant digits, representing the price of each filter sold. To calculate profit, the company collects and calculates costs for each filter to four significant digits. Due to errors in collecting, allocating and calculating costs, the company found that its cost per filter was off by as much as 10 cents per filter, yielding a profit calculation error of more than $300,000.
Accuracy may also refer to non-quantitative data, such as customer names, customer addresses, customer segment categorization, product classifications or descriptions.
To measure and monitor accuracy, data usually needs to be validated or benchmarked against other external or internal data sources.
Completeness. Completeness is a measure of the presence or absence of data. For example, a top five bank determined that customer addresses it received from a third-party data provider were 95% complete. The missing 5% cost the bank millions of dollars in missed cross-selling opportunities. Completeness also pertains to retention requirements for historical data. To perform historical trending, business analysts often require historical data spanning several years to be accessible.
It is usually easy to measure the completeness of data, especially using available data profiling tools.
Reliability. Reliability is closely related to accuracy but is more of a relative measure of how much confidence one can place in the data values. Reliability is often used for data provided from external providers. For example, a bank receiving credit scores from a credit reporting firm believes the credit scores it receives are correct for 99.9% of its prospects. For a sample set of 100,000 customers, credit scores will be incorrect for 100 customers.
Data reliability can also pertain to the reliability of the data source. In competitive intelligence systems, sources are often rated for reliability. For example, the New York Times as a source will usually have a higher reliability rating than the National Enquirer. Primary sources most often receive higher ratings than secondary sources.
Just as with accuracy metrics, data usually needs to be validated or benchmarked against some other external or internal data sources in order to measure and monitor reliability.
Availability. Data is only of use if it is available when needed. This is especially true for managers relying on decision-support systems.
The SLA4D should specify when the data is expected to arrive and the frequency of the delivery. Based on the data usage, the specification may be as granular as minutes or as large as months. The data arrival date and time should be logged and tracked.
The Wall Street Journal is often referred to as the obituary column for Wall Street investors. By the time stock prices or stock “news” is published, the street has already accounted for the information in the sales price of a stock, and it’s too late for investors to take advantage of the published information. Data almost always has an associated “timeliness” or “freshness” attribute or component. For stock traders, it may be real time as measured in seconds; for mortgage analysts, the time requirement may be as long as years.
Similar to availability metrics, the age of the data supplied needs to be specified in the SLA4D. To log and track age, the supplied data needs to be stamped with the relevant date and time. This is usually done through a date/time field associated with a file or with every supplied record.
Consistency and Uniqueness
Data consistency refers to the common definition, understanding, interpretation and calculation of a data element. The use of this quality metric is illustrated by the following example. In the process of designing a performance data mart for a credit card bank, a multidepartment survey was conducted on the interpretation of company “profit.” Some departments equated profit to EBIT, while others did not. Some departments included the cost of capital, while others did not.
Uniqueness is closely related to consistency. For a data element to be consistent, it should also have unique identity and definition. To calculate the lifetime value of large customers, there needs to be a unique definition and method of calculation for “lifetime value” as well as “large” customers.
In specifying consistency and uniqueness, definitions and/or rules need to be given by the consumer to the data supplier (for example, “large customers” have $500 million or more in revenues, and “medium customers” have between $100 million and $499 million in revenues).
Formats and Delivery Packaging
The SLA4D should also include the format of the data, how the data is packaged and how the data is delivered. This is especially critical given the highly publicized “losses” of customer data tapes and compromised PCs containing millions of customer records.
- In today’s world, a key element of an SLA4D should include detailed security procedures from
encryption to more secure handling during physical transfer.
- Delivery mechanisms may include U.S. mail, FedEx, Web site posting, e-mail, FTP or other
software communication protocols.
- Packaging may include hard copy paper, CD, DVD or soft copy.
- Formats may include specific layouts, headers, footers, meta-tags, date stamps and styles.
Provider Data Quality Certification
The data provider may be willing to provide some level of data quality certification based on its own internal testing. The SLA4D should specify a meta-tag associated with the data, indicating the level of data quality provided. A provision should be included whereby the consumer should also be able to examine the provider’s data quality test methods and test results.
Ability to Change the SLA4D
As business needs change, the data requirements and the associated data quality requirements also change. The SLA4D should provide a mechanism for communicating and implementing changes in data quality requirements as well as the data formats and delivery mechanisms. For example, a company may undergo a merger and require a redefinition of sales territories, regions and districts. It may also require a change in data delivery destinations.
Negotiating the Agreement – Lessons Learned
The following are best practice guidelines drawn from agreement negotiation sessions:
- Form a partnership between IT, end users and contracts. These three groups need to work
closely to negotiate an effective contract with the data providers.
- Develop a template. Working with the stakeholders, develop an SLA4D template as an
agreed-upon starting point. The template should include guidelines on what is negotiable and what
is not. The template should also include pricing guidelines for data quality needs. Spending the
time developing the template upfront will save considerable time in the negotiation process.
- Create measurable terms. Only include terms in the SLA4D that can be measured and
monitored. If the vendor is offering a level of data certification, it needs to be proven.
- Recognize that this is a gradual process. Different providers have different cultures
and varying tolerances to changing their products and associated services. The consumer will need
to educate the data provider and clearly communicate the consumer’s needs. This can be a
time-consuming process, and terms may have to be negotiated over numerous purchasing cycles.
- Sell as a win/win. Even if the data consumer is willing to pay for “custom” changes to
the data sources, data providers are more flexible to change if they can see a broader market for
similar data quality needs. They need to view this as an opportunity for offering new products and
services. Again, it may be necessary to educate the provider to the broader opportunities with your
company or with the market in general.
- Find alternate providers. Even if there is a preferred provider, find alternatives to
provide more negotiation leverage.
- Institute penalties for non-compliance. As with any contract terms, it is necessary to specify a penalty for non-compliance with the SLA terms and also calculate the costs for getting poor quality data or data not delivered in the agreed-upon formats, packaging or schedule.
Monitoring SLA4D Performance
To enforce the agreed-upon terms in the SLA4D, the data metrics specified need to be monitored for compliance. In a previous article, Six Sigma Data Quality Processes, I described the offline and in-line data quality (DQ) processes for both monitoring and controlling data quality.
As shown in Figure 1, there are several DQ process probe points available for monitoring data quality. Data collected from these points can be analyzed, reported and distributed to suppliers to communicate data quality performance.
Figure 1: DQ Processes
Data Providers, Consumers and Changing Roles
Figure 2 shows a typical data value chain, starting with raw source data and ending with an end consumer of the final data. The providers in the value chain actually act in two roles: intermediate consumers of data and providers of data to the next consumer in the value chain.
A typical example may be a data source aggregator, such as Dow Jones, providing raw data to an IT group or data center within a bank. The IT group may transform and “cleanse” the data and subsequently populate a central data warehouse. The bank’s customer analysis performance group then extracts the data from the warehouse into a customer data mart, performs statistical analysis and generates intermediate data products and reports. The reports are delivered and/or posted to a Web site for management consumption and decision making.
Figure 2: Data Providers and Consumers
Another example is illustrated by an emerging trend whereby a financial institution outsources one or more of the provider steps. They may do this for several reasons, including:
- Cost reduction
- Higher quality data available faster
- Regulatory compliance
- Faster time to market
Recently, a top ten bank attributed a 15% cost savings to outsourcing a critical part of its data provisioning.
Regardless of the position within the data value chain, a data quality SLA4D should exist between each consumer and each data provider, whether the data provider is internal or external.
The Basis for Sound Analysis
As critical business decisions depend more and more on data supplied by data providers, it is vital for this data to be of the highest quality achievable. Companies must begin establishing “measurable” service level agreements with their data suppliers to supply high quality certified data as the basis for sound analysis and business decisions.