Tableau is generally thought of as a lightweight piece of software, with fairly simple functionality. But users...
are increasingly putting it to work in more complicated big data environments, pushing the limits of what the data visualization tool can do.
"Data in itself is pretty meaningless. Insights need to be visual," said Gaurav Kumar, product lead in the data science engineering team at GoPro, based in San Mateo, Calif., in a presentation at Tableau Conference 2016.
He described the company's big data challenge, which includes bringing together data from a range of hardware and software platforms. Each camera the company sends log data back to GoPro servers, so the company can keep track of how people are using their products. The same goes for video-editing software that comes with cameras. Then, there are more conventional data sources, like CRM systems and other business applications.
Kumar and his team have worked to bring all this data into one platform, so it can play a role in directing product development and marketing initiatives. The big data environment starts by streaming log files into an HBase database using Kafka and Spark Streaming. Extract, transform and load jobs pull this data, as well as data from CRM and ERP systems, into a Hive data store. Data is further refined and passed to a data mart built using Cloudera Impala, which can be accessed using Tableau.
Tableau takes on big data problems
This may sound like a long and winding road for data to get into Tableau, which is more commonly used in desktop deployments to analyze smaller data files or in single-server setups. But Kumar said once you get the back-end systems straightened out, the software can be a useful tool for accessing and reporting on data.
"We have a variety of data," Kumar said. "The challenge was to bring data from many places, do transformations and make it available to analysts."
Part of the reason Tableau works in a big data environment is its underlying querying structure. It essentially pulls in data by writing SQL queries. Once you have data in a database that can be queried in SQL, it's accessible to Tableau.
"There's no magic," said Jason Flittner, senior analytics engineer in the content data engineering and analytics team at Netflix, based in Los Gatos, Calif. "When it comes down to it, Tableau is writing SQL and sending it to your database."
Flittner's team uses Tableau to analyze how users are engaging with content on Netflix. This helps inform decisions around what types of programming to produce or acquire. Data primarily comes from user sessions and includes things like what programming people watch, whether they finish the programming and if there are any parts of programming that people tend to skip.
With about 75 million streaming customers, this is a huge amount of data. To start, all of the data gets loaded into an Amazon Simple Storage Service server. The team uses Hadoop to process the data and a combination of Hive, Spark and Presto for data interfaces, each of which supports a flavor of SQL query. This makes the information in a big data environment accessible to Tableau, as well as other tools like MicroStrategy and the programming language Python.
Data visualization brings big data to the masses
At Ebates Inc., a website that helps online shoppers find coupons and rebates, the analytics team wanted a platform that was fast, flexible, scalable and cheap. They chose to build the big data environment around Hadoop, which satisfied the scalability and cost concerns. But making the data accessible quickly was another matter. For this, they turned to software from AtScale, which makes Hadoop data accessible to SQL query engines like Tableau.
Now, the company is using Hadoop as its centralized data hub and doing BI reporting for things like classifying web traffic data on top of that platform. Mark Stange-Tregear, director of analytics at Ebates, based in San Francisco, said combining the data-processing power of Hadoop with the simple report of Tableau is helping open up deeper data stores to a wider audience within the company.
"One of the difficulties I'm facing now as we try to expand use beyond the analytics team is to make sure that everyone understands they can get anything," he said.
Hadoop 2 gives boost to enterprises' big data environments
Integration an important part of big data analytics platforms
Analytics models in big data systems best work small