kentoh - Fotolia
CDPQ, a Montreal-based company that manages public-sector pension funds in the province Quebec, is navigating a tricky issue: how to oversee the work its data scientists and business analysts on analytics applications without stifling them.
As part a new data management architecture designed to help boost its analytics capabilities, CDPQ is setting up segregated data sandboxes inside data warehouses for exploration and prototyping uses by analysts. During a session at the 2015 TDWI Executive Summit in Las Vegas, Luc Veillette, the company's senior director modeling and business analytics, said the goal is to clear the way for the analysts to quickly develop algorithms that provide answers to business questions -- ultimately leading to better decision-making on investment planning and portfolio management.
But it isn't carte blanche in the sandboxes. Veillette said CDPQ's data governance program includes a set rules on using them -- for example, analysts must tap official company data sources if available, and analytical models need to be vetted during an audit stage. He and other managers also keep tabs on analytics activities.
"We try to have a centralized knowledge of what is done by our data scientists," he said, adding that CDPQ wants to make sure analytics processes are documented and data isn't being misused. Even so, the governance effort isn't heavy-handed, according to Veillette -- it's a collaborative undertaking aimed at enabling business units to become more data-driven in a consistent way.
Intuit Inc., the developer of personal finance software, has taken that kind of collaboration to another level. Two years ago, Intuit made its 150-member analytics group part of the legal department and teamed up corporate lawyers with analytics managers, data scientists and other employees to create rules for accessing and analyzing different sets of customer data tied to its various product lines.
Different priorities on analytics apps
Laura Fennell, Intuit's senior vice president, general counsel and secretary, said the move was motivated by a desire to expand the use of analytics applications to drive product development and marketing strategies -- while safeguarding the personal information of customers and trying to avoid any perceptions of data misuse that could harm the Mountain View, Calif., company's reputation.
Laura Fennellsenior vice president, general counsel and secretary, Intuit
"We had to do it right," Fennell said. "The benefit [of analytics] is great, but so is the risk. Our customer trust is everything to the brand." Ultimately, she added, "it's our customers' data, not ours." And the amount of information involved is huge: as of early this year, it added up to 6 petabytes of data on more than 50 million customers.
During a joint presentation with analytics team leader Loconzolo at the Strata + Hadoop World 2015 conference in San Jose, Calif., Fennell acknowledged that there was "a rocky start to our relationship." The legal team saw a lot sensitive data that needed to be protected, while the data scientists questioned how they could do innovative analytics work with lawyers involved in the process. Even now, Loconzolo said, why the analytics team is in the legal department "is a constant question that we answer all the time," including when he and other managers are recruiting new analysts.
But the pairing has proven to be beneficial, said Loconzolo, whose title is vice president data engineering. Before, the analytics team was working to nail down customer data protection measures with individual business units on a one-by-one basis, with the legal team providing input after initial technical decisions were made. But it was a slow and "extremely painful" process that would have taken years to complete across the company's current roster 64 products, Loconzolo said. Intuit has been able to expedite the work -- and get more data into a private cloud setup for analysis -- by centralizing it and bringing in the lawyers up front to vet raw data from the business units.
The lawyers also had to adjust their thinking as part the new process. "Our job had to change from just saying no, to saying how we could make things work" on providing access to data, Fennell said. The goal, she noted, wasn't to lock down customer data completely but to figure out how to make appropriate amounts it available so the analysts could do their jobs. To avoid having the two sides pulling in different directions, they were given shared objectives around democratizing data access for analytics uses and shared accountability for achieving those objectives.
Rent the Runway Inc., a startup in New York that uses the Web to rent dresses and fashion accessories for weddings, parties and other events, has a lot less data to deal with than Intuit does -- but it has similar concerns about not taking wrong steps with the customer data in its systems.
"That's very much part of our thinking," said Vijay Subramanian, the company's chief analytics officer. "Our philosophy is, 'consumer , and trust .'" With Rent the Runway lacking its own legal department for now, he had outside attorneys on a retainer review online forms used to collect size information and other data that gets fed into the site's recommendation engine to help point website users to dresses they might like.
Limited time for analytics projects
The startup also has limited resources, and its business needs are evolving quickly. As a result, Subramanian tries to limit the development projects his team is working on to no more than three to six months. "Anything longer than that is a huge risk for the business to take," he said. "We don't have the luxury time."
The data scientists at Rent the Runway use Python or the open source R programming language to write machine learning algorithms, including ones that power the recommendation engine and a demand forecasting system used to fine-tune pricing. To help keep the development process moving along, Subramanian has adopted a so-called minimum viable product methodology that initially limits algorithms to as few features as are needed for them to function effectively. The data scientists can then go back and add to the algorithms in another round development, he said, adding that he wants to avoid "wandering in the desert aimlessly" on projects.
On the back end, Rent the Runway warehouses its data for analysis in an HP Vertica database, pulling together a mix transactional data from a MySQL system, information on dress attributes from MongoDB's namesake NoSQL database, and JSON log files that track website activity. As the company's data volume grows, Subramanian said he expects to eventually add a Hadoop system as a repository for all the raw data in front Vertica.
One thing that won't go out fashion at the company, he said, is investing what's needed to make its analytics applications work for the business. Rent the Runway needs to be data-driven to succeed, he noted: "We look like a regular fashion company, but we have to have a good data story to convince consumers to rent instead buy."
Predictive modeling can aid big data analytics programs
Sports teams call in data analytics programs to help set prices
Learn more about the challenges data analytics applications