olly - Fotolia

Optimize your data governance policy to unleash analytics users

Creating and enforcing data governance policies is a must in self-service BI initiatives, but a collaborative approach can avoid stifling users -- and produce good analytics results.

Data governance is kind of like going to the dentist: Everyone has to do it, but few people really enjoy the process....

But in a world where companies are increasingly adopting self-service business intelligence tools and putting BI data in the hands of decision makers and operational workers, implementing a strong data governance policy with clear rules of the road has never been more important.

But that doesn't mean data management teams should take a strong-arm approach to enforcing governance policies aimed at ensuring that information is accurate and consistent throughout an organization. At the 2015 TDWI Executive Summit in Las Vegas, several speakers discussed ways that BI and IT managers can protect the quality of their data while still empowering business users and data analysts to run their own queries and build their own data visualizations with self-service software.

"We don't want to be the police," said Alexandre Synnett, vice president of data management at CDPQ, a company that manages public-sector pension funds for the Canadian province of Quebec. "Writing a 10-page document with rules saying what everybody can do -- that's not going to be our approach."

Instead, CDPQ opted for a more collaborative approach. Synnett said that rather than developing a data governance framework on its own, the data management and analytics team worked with business managers to identify potentially sensitive data and set pragmatic limitations on how that data can be used while still enabling business needs to be met. They then built governance controls and related business rules into the company's back-end systems. The work also involved taking data from siloed systems implemented by business units and centralizing storage, management and access.

Mike Lampa, managing partner at consultancy Archipelago Information Strategies, called that kind of approach "governance without the gavel." At most organizations, he said, the governance process traditionally has been focused on limiting what people can do with data. There are good reasons for that: Without such restrictions, end users could feed inaccurate or low-quality analytics results back into a data warehouse or inadvertently publish data that should be kept private for legal reasons.

We don't want to be the police. Writing a 10-page document with rules saying what everybody can do -- that's not going to be our approach.
Alexandre Synnettvice president of data management, CDPQ

But Lampa cautioned that focusing only on stopping users from doing things as part of a data governance policy diminishes the value of self-service tools and slows the ability of analysts and other workers to derive potentially valuable insights from data.

That's why he recommended being more flexible, particularly in the types of data that users are allowed to bring into applications. Forcing every piece of data through standards that require certain formats will make it harder for analytics teams to work with new data types, like social media data or Web clickstream data, Lampa said. He added that if IT managers are worried about messy data sets getting into data warehouses, they can set up segregated data sandboxes for analysts to play around in. But just saying "no" isn't a good option.

"We have to have the ability to allow data scientists to bring in external data and blend it [with internal information]," Lampa said. "It's about turning all this data into an asset on our balance sheet."

CDPQ has gone the sandbox route, building what it calls "end-user zones" into its data warehouses to support analytics prototyping and exploration. But it isn't anything-goes in the sandboxes. Luc Veillette, senior director of modeling and business analytics at CDPQ, said the company's data governance program includes a set of rules on using them -- for example, analysts must use official data sources if available, and their findings can be tapped for decision making only by individual business units.

In keeping with the emphasis on collaborative governance, though, Veillette said IT developers and the data scientists and business analysts who use the sandboxes work side by side to incorporate analytical models and algorithms into a manageable structure.   

Selecting the right tools for specific analytics purposes can also help support the use of different data types. Mark Madsen, president of research and consulting firm Third Nature, said there are many analytics and reporting tools available today that can handle data in different formats. Businesses often look for a single tool that can do many functions, but this software may require users to pull data out of the data warehouse and reformat it before it can be analyzed. Implementing a specific tool just to analyze, for example, social media data may increase the complexity of the data warehouse by adding more software to manage, but this tradeoff may be worth it.

"You have to think about platforms that give you all those things and relax control, because we slow a lot of things down by making everything go through ETL," he said.

At the same time, Madsen recommended that organizations be judicious about what they allow into data warehouses and their other analytics systems. Insights can be found by blending CRM data, product information and transaction data, but pulling those data types into a single analytical process can also slow things down. Analytics is ultimately about finding signals in the noise, and the more data you add, the more noisy things get.

"Once you've got that much data," Madsen asked, "how do you know how much is meaningful?"

Ed Burns is site editor of SearchBusinessAnalytics. Email him at [email protected] and follow him on Twitter: @EdBurnsTT.

Dig Deeper on Business intelligence best practices