Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
While the U.S. Supreme Court recently ruled pharmaceutical and data mining companies have a First Amendment right to a doctor's prescription data, broader questions about data mining, data privacy and how organizations craft policies around them remain.
Earlier this summer, in the case of Sorrell v. IMS Health, the Supreme Court overturned a Vermont state law that banned pharmacies and data mining companies from selling a doctor’s prescription history to drug makers without the physician’s consent. The high court ruled the state law was a free speech violation, as the prescription information was still made available for research or educational purposes.
Although patient identifiers such as names and addresses are removed from the data, doctors’ names and the drugs they prescribe aren’t, enabling data miners (or, as high-tech industry experts refer to them, data syndicators or data aggregators) to uncover and sell prescribing patterns to drug representatives for the purposes of targeting sales pitches.
Leslie Ament, vice president of customer intelligence research and client advisory services at Hypatia Research LLC in Lexington, Mass., cautioned the ruling will have larger implications on information regulation and security, not to mention legal brouhahas she predicts will surface because of possible infringements of the Health Insurance Portability and Accountability Act (HIPAA).
“I don’t think data accessibility is the problem or that there’s too much of it,” she said.
Analysts, like Ament, as well as vendors and end users generally support the notion that no amount of data is too much, especially when uncovering ineffective business processes or new sales opportunities. However, that typically comes with a disclaimer to also consider the causality of collection and analysis.
“The issue is when rulings come down like this one without a complete understanding of what might happen downstream, there’s a high probability that laws, such as HIPAA, meant to protect people’s private medical information, will be violated because it’s operationally possible," Ament said. "It’s sort of like ‘Why did the chicken cross the road?’ Because it’s there.”
As businesses prepare to delve deeper into data, analysts and experts warn that along with “big data” comes big responsibility.
Data mining in the age of big data
Big data is typically thought of as information growing in velocity, variety and volume at an exponential rate, reflecting the recent onslaught from social media sources and online commenting capabilities, not to mention what businesses already own but have been unable to utilize. The emphasis on big data is coupled by the influx of sophisticated vendor tools, like Hadoop, promising to help make sense of it all.
Analysts and experts say businesses should be open and honest about policies involving the collection of personal information.
“I think businesses need to have a clear and strong opt in/opt out-type of policy,” said Bruce Temkin, managing partner for the customer experience research and consulting firm Temkin Group.
Temkin, who defines today’s data-rich environment as a push-pull scenario between privacy and accessibility, believes a major transition is under way from an older generation that regards personal information as private and a younger one that chronicles intimate life details in shared online spaces such as blogs, Twitter and Facebook.
Still, Temkin said, information can be a double-edged sword for businesses, and an appropriate-use policy can help define gray areas.
“If businesses are going to use information, it should be used in a way that delivers value,” he said.
Companies need to ask themselves, ‘If our data warehouse policy was printed on the front page of the Wall Street Journal, would we embarrass ourselves?’
Adrian Alleyne, director of market research, DecisionPath Consulting
That's a big adjustment for most businesses, according to Greg Pemberton, privacy and compliance expert and a member of the corporate counsel team for the data management firm Iron Mountain, based in Boston.
“As we move into the era of big data, when so much is being collected and retained, the first point anyone should be thinking about is their actual data collection,” he said.
Businesses should gather data more discriminately, based on worth, rather than scooping up and storing everything possible, a scenario that can’t happen without a good data use policy and education, Pemberton said.
To begin, businesses need a policy that finds a delicate balance between vagary and suffocation. Pemberton recommends the policy come from the business owner, who can best articulate the organization’s data needs. Based on Pemberton’s experience, security and privacy personnel and even IT and finance departments may be asked to participate.
Once an authoritative statement has been constructed, businesses should reinforce the data policy through education. Pemberton said best practices dictate using a range of techniques, from casual cues -- such as email, news blips and even pop-ups on an employee’s computer screen when logging in -- to more structured methods.
“Leaders need to identify that this is a concern for the company and that [their employees] should be responding to it,” Pemberton said.
Ultimately, the data policy should echo an organization’s core values, according to Adrian Alleyne, director of market research for the Gaithersburg, Md.-based DecisionPath Consulting.
“Companies need to ask themselves, ‘If our data warehouse policy was printed on the front page of the Wall Street Journal, would we embarrass ourselves?’ ” he said.
“That’s really going to drive everything else,” Alleyne said. “The availability of the sensitive information is going to be there; it’s up to you -- your internal policies and processes -- to determine how it gets used.”
At least for now. Temkin predicts the struggle between information protection and openness will end within 10 years, and customers are already advocating for this to happen. Temkin performed a major study of customer perception a few years ago for a client in the banking sector, and customers told him they wanted their bank to understand their needs and preferences the way that, say, Netflix does.
“In the old days, it was spooky if a business acted as though it knew its customers," he said. "Nowadays, people expect it.”
Netflix uses an algorithm to recommend movies based on a customer’s pattern in genre selection and rating history. This information is given away freely by customers with an expectation that movie suggestions will come, but businesses are also utilizing data a customer may not hand over knowingly.
For example, in a recent effort to increase the value of its advertising space, Yahoo focused on the technical aspects of data analytics and management, but it also analyzed, where possible, a visitor’s demographics, location, what site he or she is coming from and search history, according to a presentation by David Mariani, Yahoo’s then-vice president of user data and analytics, at Gartner’s Business Intelligence Summit last May. The information is used to present the most effective and relevant banner advertisements possible.
How much is too much?
Just because businesses can collect this kind of data, should they? M. Ryan Calo, the director of the Consumer Privacy Project at the Stanford Law School Center for Internet and Society, is unconvinced that data-hungry businesses are moving in the right direction.
“Data mining is one thing, but what companies are doing is data strip mining,” he said, borrowing the turn of phrase from Chris Palmer of the San Francisco-based Electronic Frontier Foundation. “And they’re doing it as much as possible, searching for as many links as they can without any regard if they really need that level of detail.”
He sees a ripple effect happening as more businesses begin to realize the potential of monetizing the information they have; to do so, organizations don’t have to search hard for a vendor offering sophisticated techniques and assurances of increased revenue.
Instead of gobbling up expensive tools, Calo would like to see research and empirical studies proving that, for example, following users around from one website to another adds value to the business.
“Everyone benefits from empirical examinations of whether more data means better ads,” he said.
Value isn’t a one-way equation. Iron Mountain’s Pemberton points out data collection can mean incurring costs that aren’t necessarily visible. Collecting unneeded sensitive information can increase the risk of employees using that kind of data to make decisions -- a matter that could wind up in litigation. Conversely, collecting needed sensitive data invites compliance costs.
“If businesses haven’t considered compliance costs, they are looking at the wrong equation,” Pemberton said. “They are looking at the upside and not the downside of data collection.”
While Calo points out the irony that research will ultimately generate even more data, he believes a tenuous but important line exists between delivering a service and protecting a consumer’s privacy, and that line should be respected.
“Information ecologies are like any ecology, which means that you have to think carefully or the whole thing will collapse,” he said. “There is a role for analytics and data mining, but it has to be done responsibly.”