This article originally appeared on the BeyeNETWORK.
The more I look at approaches to Customer Data Integration (CDI),
Over the past two weeks I have participated in three training sessions. Two of these were the courses I taught at the TDWI World Conference in Orlando, (one on data profiling and one on data quality business rules), and one was a half-day workshop on customer data integration. This half day workshop was sponsored by a data quality tools vendor. Interestingly, most of the questions focused on standardized names and semantics associated with the data elements used in the end-user applications. Even though the TDWI attendees seemed interested in data warehousing and business intelligence implementations (compared to the CDI people who were interested in customer data hubs), the underlying barriers to success were basically the same. These barriers included:
- Low levels of data quality.
- Inability to properly deploy data quality tools in an effective manner.
- Issues related to business client acceptance and adoption of the analytic application.
- The need for standardizing commonly used business terms.
- Building an acceptable business case with a predictable return on investment.
To focus more closely on the CDI problem, we can explore Gartner’s definition of customer data integration:
“Customer data integration is the combination of the technology, processes and services needed to create and maintain an accurate, timely and complete view of the customer across multiple channels, business lines and enterprises where there are multiple sources of customer data in multiple application systems and databases.”
We can reinterpret this definition to effectively provide guidance for an implementation. Gartner’s directive to “create and maintain an accurate, timely and complete view” essentially means that the system must reliably enforce data quality assertions in a way that can be measured and reported. Similarly, the reference to “multiple sources of customer data in multiple application systems and databases” means that one must be able to provide complete and efficient data integration, as well as aggregation across heterogeneous data sources in a coordinated manner. This implies collecting variant representations associated with each individual without losing the meaning associated with variant forms. When Gartner states “across multiple channels, business lines and enterprises,” this means matrixed collaboration between technical and business managers across vertical lines of business.
So what does this really mean to the practitioner?
This actually goes back to my first point. I am convinced that success depends on fundamental information quality principles. Because of the need to integrate data for entity identification and subsequent resolution into a single “object,” we can use traditional data cleansing technologies (namely, standardization and matching) to locate the individual identities. These individual identities exist within the environment and we need to link the similar ones together.
There is one major difference between the traditional cleansing approach and the CDI approach. Whereas the variant versions are all modified into a single “correct” version in the traditional cleansing approach, you would want almost the exact opposite in CDI applications. When you “correct” one version, you lose the information that was embedded within that version. For example, changing “Howard David Loshin” into “David Loshin” loses the fact that my first name is “Howard.” This consequently reduces the probability of matching “David Loshin” to “H. Loshin,” since the information establishing the initial “H” associated with my name is eliminated. But in a world where you can expect a great deal of variation, it is best to capture as much knowledge as possible about potential aliases for a single entity.
I am also convinced that projects will fail without a facilitated business/technical alliance. As I listened to attendees’ questions, the issues were not about technical approaches, but rather how to build an effective business case that would convince those in control of allocating money to fund data quality projects. After polling the audience to determine whether they were business or IT staff, however, it was clear that most attendees were IT staff. Thus, business people are expecting IT staff to establish business cases that justify what they were dong in the context of their business.
This is fine, except for one small issue. IT personnel usually do not have the training to develop the business cases that are desired by the business staff. This leads to communications gaps. This problem is widened by the management structures imposed by vertical organizations with consolidated IT departments, as well as the lack of precision typically applied to the definitions of common business terms. Similarly, when I studied computer science in college, we did not learn how to develop business cases, ROI models, cost justifications, etc. We learned about data structures, algorithms and programming. Yet, in spite of this, successful projects frequently occur. These are the projects where the IT and business staff work together to establish successful criteria in ways with which all team members can relate.
So how can your organization establish successful CDI projects?
The first step is to recognize the apparent adversarial relationship as a potential problem, and plan to bring in neutral parties. Such parties can dissolve those “cooperation boundaries” and create a unified team. The second step is to understand how the CDI initiative will use the data quality tools to best exploit the technology. I will examine this idea more carefully in an upcoming article.
If you would like to share any similar issues or experiences, feel free to contact me at firstname.lastname@example.org.