Essential Guide

Browse Sections


This content is part of the Essential Guide: GDPR, AI intensify privacy and data protection compliance demands
Manage Learn to apply best practices and optimize your operations.

Analytics an uneasy balance between data collection and privacy

In the age of GDPR and privacy regulations, attention must be paid to user privacy. Data management tools that employ AI as part of analytics can help achieve that.

Advanced analytics, BI and AI are booming and can potentially offer great business benefits, but these technologies are extremely data hungry. Meanwhile, GDPR and other privacy regulations are forcing companies to re-evaluate how much data they collect and what they do with it.

One of those companies is Ruffalo Noel Levitz LLC, an Iowa-based enrollment and fundraising firm that works with around 1,900 campuses and nonprofits. The company touches 240 million prospective students and donors each year.

The company recently decided to invest in a formal data governance program to help ease the tension between data collection and privacy concerns.

"Compliance with data protection and privacy regulations is just one reason in a long list of reasons why," said Alison Burchett, Ruffalo Noel Levitz's associate vice president of product management and data governance.

In particular, the company was looking for a way to track data as it moves through all of a company's systems.

"We handle data across the student lifecycle, from college recruitment all the way through alumni giving," she said. "Which means we are tracking a single person through multiple data systems over the course of many years."

The biggest regulations companies like Ruffalo Noel Levitz are facing today are Europe's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act, but there are also regulations specific to particular industry verticals or business functions. In addition, many other jurisdictions are in the process of following in Europe and California's footsteps or are seriously considering it.

But getting a handle on the balance between data collection and privacy isn't just good for compliance. It also has business benefits -- including benefits for a company's data analytics projects.

Alison Burchett, associate vice president of product management and data governance, Ruffalo Noel LevitzAlison Burchett

"One of our biggest challenges is establishing a single 360-degree view of our constituent records," Burchett said.

Deploying a data governance solution from Infogix Inc. helped provide Ruffalo Noel Levitz with that 360-degree view. And being able to track data accurately also means that the company can move toward more advanced technologies.

"My hope is that we'll be able to leverage machine learning -- at least for data quality purposes -- in the near future," Burchett said.

Privacy and analytics can coexist

Ruffalo Noel Levitz isn't alone in its effort to balance data collection and privacy efforts. Many companies are implementing systems that allow them to comply with new privacy regulations and that will also make collected data more readily available for AI, BI and other analytics platforms, said Paige Bartley, senior analyst for the data, AI and analytics team at 451 Research.

Take, for example, GDPR's requirements that companies must be able to show their customers all the data they've gathered about them. For most companies, that's a very onerous task.

Meanwhile, analytics tools looking for patterns in customer behavior also need access to all the information a company has on each individual.

"The common requirement in both uses cases is to associate all the data in a company to the same customer identity," Bartley said. "They're two sides of the same coin."

It's true that, in the short run, data collection and privacy regulations mean that companies will have less data to work with. But, in the long run, they'll benefit due to the data being of higher quality.

"It's an opportunity to build trust with customers," Bartley said. "They volunteer more accurate personal data and there's less obfuscation behavior."

For example, she said, people may provide fewer junk email addresses if they can trust companies not to abuse this information.

Making sense of siloed data

The reason it's so hard to get a good view of any particular customer -- and ensure the customer's privacy -- is that the collected data is scattered over many different systems.

"This is the overarching challenge that businesses have," said Anthony Di Bello, vice president of strategic development at OpenText, which offers the EnCase eDiscovery tool. "It's a privacy problem. It's a security problem. It's a risk and legal problem."

Some of the data might be siloed in a customer management database and some in a transactional database. There might be emails to and from customers stored in individual employee mailboxes. There might be files containing customer information on cloud file-sharing sites. There might be data scraped from social media, product reviews the customer has written, images they've uploaded, conversations with chatbots or recordings of support calls. Any of that collected data could potentially hold personally identifiable information that falls under the purview of data privacy regulations.

We use AI to dynamically detect where information is private and where governance is needed.
Emily WashingtonSenior vice president of product management, Infogix Inc.

Plus, there are the backups of all these systems. There could also be leftover files in applications that are no longer being used, or in random spreadsheets and text documents.

And the data is inconsistent. Customers move and change phone numbers or email addresses or just provide information in different formats each time. Different systems use different key information, such as an email address here, a customer number there and a totally different user ID somewhere else. And there are typos.

Many enterprises don't even have a way to search all their different systems, databases, cloud services and employee desktops. And if they were able to run, say, a keyword search, the results would be messy and unusable -- like web searching was before Google came along. Relevant results may also be missing because they don't include that specific keyword.

"Say I'm searching for information about an individual," Di Bello said. "There may be information about that individual that's not identified by name, but by some reference number."

Pulling all this data into a single data lake could help, but it isn't always an option.

"Some jurisdictions have rules about sending data outside the region," said Margaret Alston, director of consulting at TrustArc, a San Francisco-based privacy compliance company. "Even in a single region, like the EU [European Union], there are some country-specific differences."

Plus, companies have to decide whether they want to follow the most stringent privacy standards for all their collected data and potentially lose out on business opportunities or create separate systems for the more lenient jurisdictions and then deal with increased complexity. And that's just the tip of the iceberg.

Companies may need to get consent in certain locations but not others, or they may face limitations on how much data they can collect. In some regions or industries, there might be additional concerns about how data is used. For example, if a company uses AI to decide whether to give someone a loan, that might run up against financial regulations.

Finally, if all the data is in one big basket, that makes it a very tempting target.

"Think about the data breaches, the risks of what can happen when you put a lot of data in one place," Alston said.

AI may improve data privacy

To help enterprises solve this data collection and privacy problem, several vendors are adding AI technologies, such as machine learning, to their data management products. This includes Infogix, the company providing data management technology for Ruffalo Noel Levitz.

"We use AI to dynamically detect where information is private and where governance is needed," said Emily Washington, senior vice president of product management at Infogix. "For example, we can identify whether the information is a U.S. Social Security number or an email address or a street address. And we do scoring to detect when information may be of a private nature."

Infogix has a team dedicated to staying on top of regulations, she added, and it has been doing that for a long time. With customers in many different industry verticals, that also includes a lot of very specific regulations.

The company can also help enterprises looking to find data stored in different systems in different formats.

"If you have a hundred applications and you're checking for something like an email address, being able to know how that email address lives across those hundred applications is difficult," Washington said. "Machine learning will help in a more dynamic way."

Some companies may decide to build AI-powered data management systems from scratch. Usually, though, going with a vendor has some advantages. First, vendors already have connections built to get data from the most common database platforms and cloud vendors. And vendors will have a set of pretrained data models to identify common data types. But working with an outside vendor also creates its own problems, like having your data in yet another place or risking that your trade secrets get out.

At Integris Software, for example, customers currently use its tools either on premises or in private clouds.

"Most of our customers are Fortune 1000 companies and have their own infrastructure," said Raghu Gollamudi, the company's co-founder and CTO. "It's all behind their firewall."

Customers can also decide whether they will share any of the models created from their data with others.

Some models, for example, are generic tools that help the system identify something like an email address. When these learning models are transferred to new data sets, it gives AI a big head start to learn how to manage the collected data and identify privacy risks.

"But anything that's very specific to a customer or is based on highly sensitive information -- those models we will never use for transfer learning," Gollamudi said.

Not everyone is on board with applying AI to data collection and privacy efforts, however.

"AI is not yet to the point of hunting down personal data and neatly assigning it to an individual with a high enough degree of certainty to make it feasible," said Kon Leong, CEO and co-founder at ZL Technologies, a San Jose-based data management vendor. "Because even if it's right 90% of the time, the other 10% of the time it could come back to bite you."

And even when it works, companies should be careful about how they use it.

"AI works very well within its limitations, but it is by no means the magic answer to all compliance problems," said Rob Perry, vice president of product marketing at ASG Technologies, a Florida-based enterprise information management technology vendor.

In particular, organizations will need to keep a close eye on AI systems to catch any unintended bias, he said.

"The human factor is critical to avoiding ethical and regulatory missteps."

Dig Deeper on Data analytics

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

What tools or strategies is your company using to ensure data privacy is part of its data collection process?