Sergey Nivens - Fotolia
When Philip O'Brien wanted to start modeling employee churn at Paychex, he knew there was a lot of potential to reduce the number of employees who leave. But he also realized that if he wasn't careful about data privacy protection, there was also potential to run afoul of employment laws.
Replacing employees who leave the company for new jobs is expensive, which makes it a ripe field for predictive modeling, O'Brien, MIS and portfolio manager at Paychex, said in a presentation at the Predictive Analytics World conference in Boston. It's relatively simple to develop a model that predicts who is likely to leave and then develop an intervention to reduce the chances of that person looking for work elsewhere. But things get complicated when you take into consideration workplace anti-discrimination laws.
For this reason, O'Brien and his team had to be careful about what kind of variables they selected for their model at Paychex, which is a payroll services company in Rochester, N.Y. Traits like race, gender and age were disqualified right off the bat because federal laws prohibit employment decisions based on such factors.
But O'Brien said there were subtler decisions. For example, he said that a person's zip code could be predictive of their likelihood of leaving, but zip code also can often be used as a proxy for race. The team eventually settled on a set of variables that generally relate to how long employees have been with the company, what types of clients they work with, how heavy their workload is and whether they work in an office or primarily work from home. O'Brien said he wanted to say as far from as possible from anything that could possibly be construed as discriminatory.
"Your model doesn't care if it discriminates, but the laws do," O'Brien said. "You have to be really sure you aren't including discriminatory variables."
Data privacy is a matter of trust
The fundamental problem when it comes to data privacy protection comes down to how you use information about people, and it's one that a growing number of businesses are facing as reliance on big data grows. The more data you have about people, the more you can model their behavior. But doing so runs the risk of alienating people.
Philip O'Brien, MIS and portfolio manager, Paychex
O'Brien said he was concerned about more than anti-discrimination laws. Once he and his team accounted for possible sources of bias in the model, they also had to think about how to use it. He said grading individuals on their propensity to leave could lead to bigger problems down the line. For that reason they decided to anonymize individual scores. Paychex corporate managers can only see aggregates by branch office. This allows them to implement churn-reduction interventions at the office-level, rather than singling out individuals.
"I don't know about you, but I would not want to know there's a model out there that's giving me a D or F score," O'Brien said.
Trust keeps the digital economy afloat
Companies that don't respect the privacy wishes of those they collect data from could soon face a backlash, said Cameron Kerry, a lawyer with the firm Sidley Austin, which is based in Chicago. Kerry helped write the Obama administration's Consumer Privacy Bill of Rights.
Speaking at the Big Data Innovation Summit in Boston, Kerry said one of the main reasons why e-commerce was slow to take off in the 1990s was that people didn't trust it. Today, with large-scale data breaches making news so often, consumers are again starting to think about trust. They're asking why businesses have so much data about them, why companies aren't securing it, and what firms are doing about data privacy protection.
"Companies understand that they have a problem," Kerry said. "Target has brought that home to people. Sony has brought that home to people. But still not enough is being done. Trust is essential in the digital economy."
Kerry said he sees a limited role for regulations around data collection and analysis. The Consumer Privacy Bill of Rights lays out rules for businesses, but he described it more as a starting point. Ultimately, if we're going to continue having an economy that's driven by technological innovation, regulations won't be able to keep up, which means the onus is on businesses to use data respectfully.
Privacy issues related to predictive modeling may only grow as data volumes and the diversity of sources from which businesses gather data increases, said consultant and author Dale Neef. For him, self-policing is the most effective approach to privacy currently available. But he doesn't believe every business is motivated to follow through on it. And with little political movement on the issue, he sees the status quo enduring.
"I think the whole concept of privacy is certainly different than it's ever been," Neef said. "It might be lost."
Read this summary of data privacy laws in the U.S.
Consume data privacy should come before business value