alphaspirit - Fotolia

Visual data exploration a key first step for deeper analyses

Visual data analysis is an important first step in any advanced analytics project -- and analysts and data scientists who overlook it do so at their own peril, experienced users warn.

Visual data exploration may seem like Analytics 101, but analysts who skip this step may miss out on valuable insights and a deeper understanding of the data with which they are working.

"If something looks wrong in your data, it probably is wrong," said Tatiana Gabor, an analytics manager for the revenue team at music streaming company Spotify.

Visual data discovery tools consistently rank among analytics purchasers' top priorities. However, the software is often deployed as an end unto itself, with many businesses purchasing it to function as a self-service analytics tool for business users. In the hands of experienced data scientists, however, it can produce even deeper insights.

Data exploration is a recommended first step in any analysis, but analysts often just look at numbers: summary statistics like mean, median and spread. They don't always engage in visual data exploration.

Some analysts also bring a set of assumptions to data and test those right off the bat by running the data through a regression or clustering model. But jumping to these techniques first can cause an analyst to overlook important features of the data.

Look before you leap into analytics

In a presentation at Tableau Conference 2017 in Las Vegas, Gabor said her team of analysts starts every project by visually exploring the available activity data collected on Spotify users. The team analyzes patterns in user behavior to understand how people respond to changes in the Spotify platform and to develop new ways to keep users engaged.

The most important benefit of visual data exploration is it enables you to assess the quality of your data, said Gabor, who works at Spotify's U.S. headquarters in New York. You can immediately see outliers or clusters of data points that may not be realistic based on an analyst's domain knowledge, she noted. Analysts can follow up on either of those issues and, if necessary, correct for them before beginning formal analysis.

The visual approach also highlights important aspects of data sets. For example, it shows the "shape" of data, such as whether it has a normal distribution or a long tail in either direction. It can also illuminate correlations between two variables. Of course, correlation doesn't equate to causation, but identifying potential trends by visually exploring data can lead analysts to examine relationships between variables that they might not have thought to look at otherwise, according to Gabor and other conference speakers.

Beware of missed insights

Peter Gilks, director of product insights for the Spotify revenue team, said during the presentation that any data analysis must stem from a hypothesis or a set of questions a company wants to answer. An analyst could start by just punching in queries written in R or Python -- but that approach may lead to missed insights, Gilks cautioned. He said visual data exploration allows analysts to better shape their hypotheses from the beginning by highlighting patterns or trends in the data.

That's particularly true at Spotify because of the amount of data available to its data scientists and analysts, Gilks said. The company collects a variety of user data from its app, including clickstream records; with 60 million paying subscribers and a total of 140 million active users, that adds up to an enormous volume of data.

But it isn't just organizations like Spotify that can benefit from upfront visual exploration of data, Gilks added. "Everyone who works with data can and should be using these techniques," he said. "If you don't do it, you may not see the forest for the trees."

Next Steps

CIOs eye data exploration tools to navigate big data

Data visualization tools help clean up messy data

Compare the top visual analytics vendor tools

Dig Deeper on Data visualization software