In the last few years, data visualization tools have enjoyed a surge in interest; techniques have matured beyond...
basic pie charts and bar graphs into advanced infographics that include heat maps, scatter plots and arc diagrams.
But the more advanced tools tend to lack the ability to make use of one of the fastest growing data types: Text. Based on user comments and presentations at the eighth annual Text Analytics Summit in Boston last week, businesses are ready for more sophisticated ways to turn content like text into something intuitive, easy to consume -- and visual.
“Data is data,” said Erica Driver, senior director for global product marketing at QlikTech and a summit attendee. “Pictures help us to understand the data’s implications easily and quickly.”
The death of tag clouds
Text visualization tools do exist, and one of the best known examples is tag clouds. The tool organizes words by frequency of use so that high-volume words are depicted in larger point sizes and bolder colors, and low-volume words appear smaller and duller.
More on data visualization
Data visualization tools key to ‘big data’ analytics success
IT bottlenecks drive data visualization tool purchases
Startup launches analytics engine to visualize the Web
But tag clouds can take the analysis only so far, according to summit speakers like David Williams, manager of marketing analytics and optimization for Orlando, Fla.-based Walt Disney World. While they can visualize what’s being said, they provide no insight into why or how they’re being used.
“You’re just looking at words,” Williams said during his presentation, “and you don’t know the context of those words.”
To dig deeper, Disney uses a tool from SAS to conceptualize the relationship between words. Using a Web-like visualization called concept linking, Williams and his team explore how intimately words are connected to a specific concept.
“SAS actually shows, depending on the thickness of the line [linking the words together], the strength of that relationship,” Williams said.
Disney studies user inquiries from internal sites like the Walt Disney World Mom’s Panel as well as comments from external sites like TripAdvisor where visitors can rate their experiences at Disney’s hotels or resorts. Williams and his team mine the text to find patterns that they hope will help inform marketing initiatives.
“We wanted to analyze the text and see what’s driving the number of stars people give,” Williams said. “And what’s driving the positive reviews.”
Really, really big text
A big problem for text visualization is its lack of structure. Data can be plotted in points on a line, but text is nuanced and presents gradations that make it hard to categorize, said Susan Feldman, vice president of search and discovery technologies for Framingham, Mass., research group IDC and a summit attendee.
“Text exists on a continuum, and that’s not easy,” said Feldman, adding that, for example, some language can be negative or kind of negative, positive or kind of positive. “If you view the collection of information in a simplistic way, it won’t do you much service.”
And visualizing text may mean backing away from what’s become comfortable and standard to business intelligence (BI) professionals.
“For people from the BI side of things, the paradigm is the dashboard,” said Seth Grimes, consultant and owner of Alta Plana Corp. in Takoma Park, Md., and founding chairman of the summit. “They want to push text into the dashboard. But visualizing whatever source needs to match the properties of the source.”
Even so, and especially for those who have to find relevant information out of mounds of text, visualizations hold the key to discovering outliers or uncovering patterns that may otherwise have gone unnoticed. Jason Baron, director of litigation for the National Archives and Records Administration based in College Park, Md., is an advocate for visualizations in law.
“When it’s up to 1 petabyte of information, we can’t deal with it,” he said during his presentation. “I encourage people in this room to think about what lawyers need and what you can do for them to help them look at the space.”
While the legal industry has been standoffish to newer e-discovery techniques, a recent ruling has sparked debate among professionals on the use of technology-assisted review, which uses algorithms to basically predict document relevancy rather than the more unrefined keyword search. Now Baron wants to see the industry push even further and use visualization tools that can map out, for example, who talked to whom and when.
“Keywords are not giving us enough data,” he said. “We need visualizations to see what’s really going on.”
But lawyers -- and users in general -- will have to continue to wait, Feldman said. While some vendors and businesses are experimenting with ways to visualize social networks, for example, they can become too complex and hard to read or too watered down and miss vital details.
“The huge challenge is there are no standards,” she said. “So how do you find the black swan or serendipitous idea? We don’t know how to do that yet, especially with a billion documents.”