Scott Phillpott has been mining and visualizing data since 1998, and while his analyses are motivated by a “bottom line” principle similar to banks or retail organizations, the intelligence derived from Phillpott’s data can be used for dissimilar purposes.
Phillpott, a former Navy captain, is a cyber and maritime analyst for the Virginia Beach, Va.-based Valkyrie Enterprises LLC, which provides engineering and technical services to clients such as the Department of Defense and the Navy. Phillpott is involved in business development with the company’s burgeoning cyberoperations branch, which also means performing ad hoc cyberintelligence duties. In other words, he keeps tabs on computer network attacks, online virus outbreaks and cyber terrorism attempts.
To do so, Phillpott uses
More on data visualization
Read how advanced data visualization tools help curtail theft
Learn how data visualization tools may be the key to ‘big data’ success
Discover how data visualization could be vital for pervasive BI
“The more ways you can look at the data, the better,” Phillpott said.
One he’s more recently implemented is from the small startup Recorded Future, headquartered in Cambridge, Mass., and Göteborg, Sweden. Recorded Future’s technology presents users with a Web interface that operates much like a search engine. Yet rather than asking an engine to find every reference to a particular request, Recorded Future takes it one step further by coupling topics with time.
Deep dives into Web data
Recorded Future teases out facts from news stories, government sources, blog posts and selected Twitter feeds. Facts are organized into categories -- such as people, places, products and businesses -- and timestamped, indicating when the fact will do something. Recorded Future then weighs search results with a set of metrics like source, repetition of information and sentiment before organizing them into a digestible, visual dashboards, turning unstructured data into structured data.
“Rather than sit there and fumble through Google or fumble through other tools, I can create a template that does that deep-dive search for me,” said Phillpott, who is the only Valkyrie employee currently using the technology.
Phillpott uses the technology to monitor subjects he’s determined are important to keep up to date on. With Recorded Future, he can essentially “follow” those subjects across the Web, and that kind of data has the potential to provide insight he may not have encountered otherwise.
For example, Phillpott said he’ll create a watch list of Valkyrie partners to aggregate and map out what organizations are doing, what they may be interested in and what they’re talking about publicly. That kind of information can help Valkyrie provide better customer service to its partners, he said.
“Let’s say they’re going after a contract, and we know we have some skill set that could help,” he said. “We could add value to them.”
Phillpott said the technology was highly intuitive and easy to use, but noted the difficulty of importing and exporting data. Data points are identified by Recorded Future and he cannot import other related data he’s found into the system or export that data to be used with another data visualization tool. Currently, anytime he finds an instance for crossover data from one tool to the other, he is performing the task manually.
Plus, he said, some of the searchable entities he may be interested in have not been identified by Recorded Future.
“Some of the companies I’m looking at are really small, and they maybe don’t have a big presence and haven’t been identified by Recorded Future,” Phillpott said. “Sometimes you have to go back in … and teach the system a new entity. And that’s cumbersome.”
Time + analytics = ‘temporal analytics’
But Recorded Future is more than a monitoring device. While the front-end of the tool is a Web interface, the back end is a bit more complicated.
Recorded Future sifts through 100,000 Web pages an hour, according to the company’s CEO and co-founder Christopher Ahlberg. The technology uses text and sentiment analytics to organize historical and forward-looking patterns and make them available for consumption.
Ahlberg, one of the original forces behind the data visualization tool Spotfire later acquired by Tibco, calls the machine a temporal analytics engine; others have referred to it as a more advanced version of Google and a method for spying. (Along those lines, it has attracted attention -- and investment -- from both Google and In-Q-Tel, which invests in technology on behalf of the U.S. government.) But it’s also been referred to as a “prediction machine.”
When Ahlberg describes the technology, it’s under the latter umbrella. For many businesses, the holy grail of analytics falls to predictive analytics: Does a company’s data contain undiscovered patterns that could remove some of the riskiness of gut decision making or spot trends before they’ve trended? Recorded Future is operating on a similar theory, but its database is huge: the entire Web.
Currently, Phillpott is using the tool for more organizational purposes -- to keep abreast of important dates that are happening now and down the road. But Ahlberg is hoping his analytics engine will actually be used to spot patterns before an event becomes an event.
A white paper on “big data” and Recorded Future compares itself with “niche examples” like Google’s Flu Trends, which was able to detect flu outbreaks based on Google search patterns.
“By applying temporal analytics to what is written about the future, and by algorithmically crowd-sourcing the resulting temporal information, we can draw conclusions and gain insight,” reads the paper, which was written by Staffan Truvé, chief scientist and Recorded Future co-founder.
But the “prediction engine” comes with a caveat: Text analytics is still a maturing technology, and that means potential errors can be introduced when aggregating data.
“Accuracy is an issue,” said Ahlberg. But he believes the issue is a small one. On the whole, he said 80% to 85% of the text analytics are dead-on.
For someone like Phillpott, misreads of the data are to be expected. “It’s not a perfect system,” he said.