Three years ago, Montreal-based Yellow Pages Ltd. started expanding its use of Hadoop. The company, which offers a variety of mobile apps and digital marketing services in addition to traditional telephone directories in Canada, was in the process of moving some outsourced analytics applications back in-house, and one of the applications used a Hadoop cluster as its data transformation tool.
But after that application, which calculates and reports on ROI metrics for clients that advertise with Yellow Pages, was replicated internally, Richard Langlois, the company's director of big data and analytics, noticed that the Hadoop cluster was underutilized. It was basically just being used to stage data for the ROI application, which only took about three hours per day. Langlois wondered if the cluster, based on Cloudera's Hadoop distribution, could be put to use throughout the rest of the day -- essentially, as a Hadoop BI system.
"When we brought back the application, the Hadoop part was simply used to sort our records -- it was used as an ETL machine," he said, referring to extract, transform and load data integration processes. Langlois decided to see if he could tune the cluster to also run more traditional BI applications on the Hadoop platform.
Choose your tools with users in mind
Langlois eventually brought in software vendor AtScale's namesake technology, which aggregates and manages frequently queried Hadoop data into a server's memory. That is designed to make the data quicker to access than traditional Hadoop queries, which typically are optimized more for large-scale operations.
Richard Langloisdirector of big data and analytics, Yellow Pages Ltd.
One of the biggest decisions Langlois made in deploying the AtScale software and moving ahead on the Hadoop analytics effort was to meet his users where they were. That meant leaving in place an existing deployment of Tableau's BI tools. At the start of the project, Langlois envisioned standardizing the entire company on Information Builders' WebFOCUS software, which had already been implemented for some reporting needs, as the front-end tool for analyzing the Hadoop data. But he found that the marketing department at Yellow Pages was already using Tableau.
Langlois said the subsequent decision to allow marketing to continue on with Tableau as its go-to BI on Hadoop tool, while other departments are using WebFOCUS, helped smooth adoption and quickly provide value to business users. "Our strategy is to allow the business to use the tools they deem appropriate to operate," he noted.
In addition to the ROI calculator application, Yellow Pages is now using the Hadoop and BI setup for things such as giving corporate customers information about how their digital ads rank on user engagement against ones placed by their competitors.
Don't skip Hadoop BI governance steps
Exposing Hadoop data to a greater number of business users created some governance concerns for Langlois and his team. The idea of using Hadoop as a data lake continues to grow in organizations, as more businesses view Hadoop clusters as a relatively cheap storage option for new types of unstructured and semi-structured data that can fuel expanded BI and big data analytics initiatives. But it would be easy to dump potentially sensitive data into Hadoop without putting proper controls on who can access it and how it can be used.
Langlois said he didn't want to let his Hadoop cluster turn into an ungoverned data store. Rather than automatically feeding data from source systems into Hadoop, a member of his engineering team looks at everything before it goes in. The engineer applies metadata and security tags so that the data is organized and can be made available to or withheld from individual BI users based on their roles and permissions.
The data being fed into the Hadoop platform is mainly clickstream records and other Web metrics, such as info on ad impressions. Langlois said keeping things properly organized is particularly important when the cluster is being used as a back-end BI system. The goal, he added, is to create "a solid analytics data foundation" for business users at Yellow Pages.
Sky's the limit for using Hadoop data
Organizing the data in Hadoop for BI applications has also had the side benefit of staging it for other analytics uses. For example, Langlois and his team recently implemented the Spark processing engine on top of Hadoop for a machine learning application. He said the application learns from experiences with previous Yellow Pages clients how successful marketing campaigns were structured and then takes information about new clients, such as their industry and regional location, to prescribe specific marketing strategies. The analytics team is also looking at ways to use similar machine learning techniques to move beyond selling ad placements to selling customer leads directly to businesses.
These ongoing developments are being driven in part by a strategy of making it easier for product managers and data scientists alike to query data for Hadoop BI uses. In turn, Langlois said the analytics efforts are helping Yellow Pages to maintain its business relevance in the 21st century; in fact, the company got 61% of its 2015 third-quarter revenue from digital offerings. "It's a totally different Yellow Pages from five years ago," he said. "This is why analytics is important. It changes our products and improves our processes."
Vendors promise new functionality, BI reporting on Hadoop
When is Hadoop the right tool for analytics applications?
Efforts to speed up Hadoop for BI reporting face some hurdles