Published: 06 Oct 2016
Predictive modeling and other kinds of advanced analytics are done with powerful software built specifically for running complex algorithms on large data sets, such as programming languages like R and Python and analytics tools like SAS and IBM SPSS. But many data scientists and analytics managers will tell you that a lot of their work -- and ultimately their ability to provide useful information to business executives -- also relies heavily on more humble data visualization tools.
Far from being a bit player in analytics applications, data visualization fills several crucial roles throughout the process. From initial data exploration to development of predictive models to reporting on the analytical findings the models produce, data visualization techniques and software are key components of the data scientist's toolkit. Without them, analytics teams are engaging in a nearly impossible task that's tantamount to flying an airplane while blindfolded.
"Data visualization just makes our analyses so much more efficient," said Daqing Zhao, director of advanced analytics at Macys.com. "The human brain can only comprehend so much. The only way to see patterns is to use your eyes."
The advanced analytics team at Macys.com -- the San Francisco-based online arm of retailer Macy's Inc. -- is primarily responsible for the website's performance and features. The data scientists managed by Zhao build recommendation engines, perform A/B tests of new webpage layouts and help the marketing team plan and execute targeted email campaigns. They run a mix of machine learning and predictive modeling applications that require a variety of tools and approaches, and data visualization enters very early in the process.
In fact, Zhao said his team starts every job by visualizing the data it's working with. For example, the analysts might pull out some specific variables into a graph to see if there's any correlation between them. Or they'll chart basic summary statistics -- things like mean and median averages, data spread and standard deviation metrics -- to get a sense of the scope of the data. Exploring the data visually gives them a better idea of where to focus their attention when building analytical models than they could get by looking at a giant spreadsheet, Zhao said.
Some of the analysts use Tableau software to explore and visualize data sets. Others use visualization components built into more sophisticated analytics software such as SAS, R and the H2O open source machine learning platform. Some are even visualizing data directly in Excel spreadsheets. "We're tool-agnostic," Zhao said, adding that in an advanced analytics environment, it's best to support whatever tools your data scientists feel most comfortable using.
At BuildingIQ, an analytics services provider that helps building owners and facilities management companies predict and control their energy use, data visualization similarly helps narrow down data sets and provide guidance on developing predictive models and algorithms to the company's analysts. BuildingIQ, which was founded in Australia and is now based in San Mateo, Calif., collects data from the heating, ventilation and air conditioning (HVAC) systems in buildings; identifies power consumption trends; and looks for areas in which the buildings could become more energy-efficient. Boris Savkovic, the company's lead data scientist, described data visualization as a "first pass" in that process.
Savkovic and his team create advanced machine learning algorithms using Mathworks' MATLAB software. The algorithms take into consideration variables such as historical energy usage, future weather forecasts, power meter readings, information from HVAC pressure sensors and energy cost data. It's a lot to take in all at once, so the analysts start by employing some simple data visualization techniques. Generally, they put a couple of variables into a line plot to see if the metrics track together. If so, that might be grounds for investigating whether there's a true statistical correlation and building an analytical model around the data.
"Visualization is the bread and butter," Savkovic said. "It helps expose patterns over time as well as patterns between different variables. Plotting a number of variables helps paint a picture as to what issues might be present in a given building."
Predictive analytics programs are becoming more common in organizations, fueled partly by the rise of big data architectures and the increasing commercialization of machine learning technologies. As a result, predictive modeling and data visualization tools appear to be developing an even greater affinity for one another.
In an ongoing survey conducted by TechTarget Inc., data visualization was the top business intelligence and analytics technology that respondents said their organizations had invested in during the past six months before they took the survey. As of late August, 43.5% of 2,950 respondents reported recent data visualization purchases. Meanwhile, predictive analytics ranked fourth (20.7%) on the list of technologies that the respondents were asked about.
However, the two technologies essentially tied for the top spot on planned investments over the next 12 months. Predictive analytics was narrowly ahead, selected by 38.3% of 3,980 respondents, while data visualization came in at 37.8% (see "Ties That Bind"). Those results jibe with the findings of a separate "BI and Big Data Analytics Market Landscape Study," also conducted by TechTarget. Based on a survey of 612 IT, BI and analytics professionals in late 2015 and early 2016, the study ranked data visualization as the top technology on "spending intensity" yet placed predictive analytics first on a "momentum index" showing increased interest in implementation.
The ties between the two technologies don't just apply to the analytics planning stage. Data visualization techniques and tools can also help keep the development and "training" of predictive models on track. In this highly technical phase of the analytics process, the popular image of a data scientist hunched over a keyboard unspooling lines of code isn't far from the truth. But it can be easy to lose your way in a maze of parentheses, brackets and commands. At this point, a picture can be worth a thousand lines of code.
Brendan Herger, a data scientist at banking and credit card company Capital One, based in McLean, Va., said he uses data visualization software to monitor the data coming out of predictive models as he writes and tests them. That helps him see whether a model is working as expected and its output makes sense. Herger uses H2O to build and run the models as part of machine learning applications, and he visualizes the data with H2O Flow, a web-based interactive user interface offered by vendor H2O.ai.
In addition to visualizing data for his own benefit, Flow lets Herger share the results of his work with other members of the data science team at Capital One so they can also take a look and confirm the effectiveness of the predictive models he's building. "It's pretty cool to be able to spot-check and make sure the data looks right," he said.
All-inclusive analytics loop
Reporting on the results generated by predictive models is where effective data visualization techniques can really pay off -- or, conversely, where advanced analytics initiatives can go awry. If data scientists aren't able to show corporate executives and business managers that finished predictive models are delivering worthwhile information with the potential to improve internal decision making and operational processes, support may dry up, and analytics projects could be cut back or abandoned altogether.
Daqing Zhao, director of advanced analytics at Macys.com
"It's critical to visualize a model when you're presenting it to business executives," said Brett Spicer, lead business insight analyst at ArcBest Technologies, the IT subsidiary of freight and logistics company ArcBest Corp. in Fort Smith, Ark. "They need to see [the data] in a way that's comprehensible."
Currently, ArcBest has one predictive model in production, used in its truckload brokerage service, which connects corporate customers looking to ship goods with third-party trucking companies that have available capacity. Spicer said the model, developed in R, helps ArcBest employees match loads with freight carriers more efficiently than they can do manually. Reports with embedded data visualizations are created using MicroStrategy's BI and analytics software to share information about the matching process, he added.
Likewise at Macys.com, Zhao's analytics team uses the data visualization tools it has deployed to generate reports for marketing managers on email campaign performance and popular products. He said the visualized data shows the marketers whether they're promoting the right products to the right customers, helping to make the marketing operations more data-driven -- something that would be harder to achieve otherwise with business users who lack advanced quantitative analysis skills.
"Visualization makes data accessible to a much wider audience, and that helps grow the analytics culture of the organization," Zhao said.
Adding context to raw data
Most of the data being analyzed in predictive modeling and big data analytics projects is nothing more than a collection of ones and zeros. On its own, the data doesn't mean much. It needs context, and that's what data visualization can provide.
Omega Point Research Inc. sells analytics software that uses machine learning algorithms to check investment portfolios against a set of economic indicators to assess potential financial risks. The platform, built around the Databricks distribution of the Spark processing engine and Spark's MLlib machine learning library, was developed by a team of Ph.D.s, some with experience doing high-energy particle physics work at the CERN research laboratory in Switzerland. But to Omer Cedar, Omega Point's co-founder and CEO, the technical capabilities of the New York company's machine learning models aren't any more important strategically than the ability to provide visually engaging reports to investment managers.
"Our attention to the visualization piece is just as important as the attention to the machine learning," Cedar said, adding that the analytics data generated by the company's algorithms "isn't useful to a human being unless it's visualized in an intuitive way."
Data visualization techniques play important role in big data analytics
Tools for visualizing data are getting more feature-rich and complicated
Focus in using data visualization tools should always be on business value