BOSTON -- Text analytics, while still a nascent technology, is slowly but surely growing in popularity in the enterprise, according to IDC
Speaking at the Text Analytics Summit, held this week in Boston, IDC's Sue Feldman and Hadley Reynolds said the market for text analytics – the process of mining text-based, unstructured data for patterns or other insights – grew by 20% year-over-year between 2006 and today, and, though slowing because of the recession and overall decline in IT spending, it will continue to expand over the next two to three years.
IDC estimates that the enterprise search market, which includes text analytics software and applications, will reach an estimated $3 billion by 2012, up from a current estimate of $2.25 billion. Market drivers include desire for improved customer relationships, more sophisticated predictive analytics capabilities and concerns over eDiscovery compliance, according to Feldman.
Among the handful of companies at the summit currently using text analytics software was Biogen Idec, a biotech firm based in Cambridge, Mass. The company uses text analytics software to mine journal articles and other medical literature for information that will help its scientists and researchers develop new drug treatments for diseases like lymphoma and rheumatoid arthritis, according to William Hayes, director of library and literature informatics at Biogen Idec.
The biotechnology and related industries spend more than $1 trillion on bio-medical research in the U.S. annually, Hayes said, and much of that information is published as scholarly articles in publications like the New England Journal of Medicine. Text analytics software from vendors Linguamatics and InforSense help Biogen Idec mine these text-based documents, based on pre-designed models and keywords, for relevant information on chemical compounds and other substances that make up the drugs it is considering developing.
And with each new drug therapy costing on average $125 million and taking two to four years to develop, every bit of relevant information Biogen Idec researchers can find is valuable, Hayes said. In some cases, information derived from text analytics, like unexpected side effects that another researcher may have discovered and published in a medical journal, leads the company to abandon potential drug therapies, saving millions of dollars that can be allocated to more promising drugs.
There would be no way to tap the vast amount of medical literature for this type of valuable information without text analytics software, Hayes said. Manually trolling the articles simply isn't cost effective or scalable.
But the text analytics software that Biogen Idec uses is far from perfect, he said. Around 30% to 70% of the returns the software generates aren't particularly useful, requiring humans to sift through them for the valuable information. "There's still always a fairly good amount of noise [in the returns]," Hayes said.
In fact, while a huge benefit of automated text mining is the reduced need for manual searching of unstructured data, significant manpower is still often needed, depending on the use case, according to a number of text analytics vendors who addressed summit attendees. The group included James Cox, director of text mining at SAS Institute, and David Bean, CTO of Attensity.
Text analytics software, they agreed, is good at mining text-based data to identify large trends. But it is less accurate when analyzing individual content items to determine their sentiment – like whether a particular email from a customer has a negative or positive message about a product – because of the many nuances of spoken language. In those cases, people are sometimes needed to interpret text analytics results.
IDC's Feldman agreed that text analytics software still has a long way to go but said demand for the technology is "catching on" as IT departments continue to transition from "transaction-based to language-based computing." She and Hayes also predicted a period of market consolidation as the mega-vendors look to get ahead of customer demand and add text analytics capabilities to their larger enterprise software stacks.
The wave of consolidation may actually have already begun. In the last two years, SAS Institute bought text analytics specialist Teragram, SAP added text analytics technology to its repertoire with its acquisition of Business Objects, and Microsoft did the same when it bought FAST Search & Transfer.