Guide to managing a data quality assurance program
A comprehensive collection of articles, videos and more, hand-picked by our editors
As business intelligence and analytics adoption expands within organizations, there's a natural push from business users to add more types of data from disparate sources -- often external ones -- to BI systems. That raises questions about the validity, consistency and overall quality of the data being added -- questions that need to be answered before data errors result in flawed analysis.
Although data quality issues have always been a concern for BI managers, trends such as the increasing focus on big data and social media analytics exacerbate the challenges of ensuring high quality levels because of the broader range of information companies are now collecting. In the past, most of the data in a BI system was maintained internally, and many businesses overlooked the need for a formal data quality strategy. That's no longer a valid approach to managing BI data.
To get the most out of a BI program, organizations must address their data quality problems head-on, taking advantage of commercial data quality tools to support the efforts when they're a good fit. BI systems can provide companies with valuable business insights, but only if they're implemented and managed in a way that engenders trust in the data being analyzed.
Faulty thinking on data fixes
In many businesses, there's a false expectation that BI tools will solve data errors themselves. The reality is that business intelligence applications bring previously undetected data quality issues to the forefront. Invalid part numbers, inaccurate dollar amounts and multiple customer records with inconsistent data are only some of the data quality shortcomings that are likely to be identified as part of a BI project.
More on creating an effective data quality strategy
Check out an expert's list of five tips for implementing a data quality plan
Learn how data quality poses a business intelligence challenge for CIOs
Read about involving business users to build a data quality process that works
If information is inaccurate in source systems, it eventually will be reflected in the same way in a BI system if nothing is done to correct the issues. But what is the most effective, and efficient, way is to manage BI data quality improvement efforts? And who should be responsible for BI data quality management?
In a perfect world, assessing and correcting data at the source is the best approach, along with getting business users actively involved in ensuring that data is clean to begin with. But real-world situations often make that difficult. Many operational systems were developed years or even decades ago without processes for correcting inaccurate and inconsistent data entries. A majority of companies have yet to implement master data management programs and systems that use master reference data to help identify and fix quality issues as data is entered into systems. For many organizations, managing data quality at the source means high costs and a lot of programming hours.
In most businesses, then, data quality measures likely need to take place in the BI and data warehouse layer, through a concerted effort, before data is made available for analysis. It's a simpler process to correct data residing in a centralized database, or as information is being integrated for loading into one, than it is to resolve problems separately in operational systems. In addition, software vendors have developed robust data profiling and cleansing tools that can automate data quality processes and are best deployed and used by a central team. Managing data quality activities in one place should result in lower costs compared with modifying data in individual source systems.
Gravitate to the center on data quality
Both strategies -- fixing data in source systems and doing so through BI projects -- are perfectly valid. But organizations with a strong BI infrastructure overseen by a business intelligence competency center or other central team will be best served by leaning toward the latter approach. That doesn't account for data left to reside outside of a data warehouse or other analytical systems, of course. If a company's business users are looking to utilize such data in analytics applications, BI and IT managers will need to work to develop processes for incorporating it into their data quality strategy.
As BI processes mature, data becomes more varied, and companies increasingly try to use analytics to get ahead in a highly competitive global marketplace, the quality of information in BI systems will affect whether business users trust the findings of queries and reports. Ultimately, it will affect their willingness to rely on, and continue using, the BI environment. Decision makers must have access to relevant, valid data sets that they believe are relevant and valid.
Successful initiatives might even spur businesses to look beyond data quality for BI toward an organization-wide approach to information quality, with the backing of key executives who have seen the benefits of clean and consistent data firsthand. That support will be crucial as BI and IT teams work to create effective data governance policies and procedures in order to ensure higher quality levels across both operational and BI architectures.
Lyndsay Wise is president and founder of WiseAnalytics, an independent research and analysis firm that focuses on business intelligence deployments at small and midsize businesses. Wise has more than 10 years of experience in business systems analysis, software selection and implementation of enterprise applications. Email her at email@example.com.