This article originally appeared on the BeyeNETWORK.
My previous article in this series, Using Enterprise 2.0 for Business Intelligence,
The Role of Unstructured Content in Collective Intelligence
Collective intelligence represents the convergence of three key decision-making technologies: business data intelligence, business content intelligence and business collaboration (see Figure 1). This article focuses on business content intelligence, which extends traditional business intelligence (called business data intelligence in this article) with analytics created from unstructured business content.
Figure 1: Collective Intelligence
Although unstructured business content represents about 80% of the information that exists in organizations, not all of this information is in a readily accessible form, and little of it is used at present by BI analytical applications. Improvements in content management technology, however, are leading to more of this business content being stored and managed in shared databases, which not only improves ease of access, but also content quality and business value. These technology improvements coupled with the explosive growth in web-based content are now causing organizations to evaluate the use of unstructured business content for improving the decision-making process.
Searching and Exploring Unstructured Content
Corporate business users are demanding Google-like search capabilities in the enterprise because they think this will allow them to search and explore business content and business data as easily as they can access information on the public Internet. Web search and enterprise search, however, are quite different.
Web search involves simple two or three word search queries that are applied to popular web content. The focus of web search is on speed, rather than accuracy, and search results are based primarily on content popularity.
Enterprise search, on the other hand, involves a much broader range of data sources, and the focus in this environment is more on accuracy than speed. Content popularity ranking systems are not appropriate for enterprise use, and this is why enterprise search tools require semantically richer interfaces involving techniques such as faceted search and natural language query processing. These latter techniques work in conjunction with business content analysis facilities to extract metadata that can aid search engines in filtering source information to better match business terms in search queries. Corporate business users are more knowledgeable about the information they need to access and are, therefore, more capable of coding better search queries, which again aids search engine filtering accuracy.
Techniques used to extract metadata from business content can also be used to produce analytics from that content. This is especially the case for text data. Search vendors such as Endeca and Fast Search & Transfer (FAST) – currently being acquired by Microsoft – have added analytical dashboard front ends to their search tools and are marketing them as alternatives to analytical tools from BI vendors.
IBM offers capabilities similar to Endeca and FAST, but recognizes the fact that this approach is best suited to applications that focus primarily on business content, rather than business data. For environments, such as business intelligence, where companies are interested in adding business content analytics to applications that primarily use structured business data, IBM enables metadata extracted from business content analysis to aid in the migration of unstructured business content to a data warehousing and business intelligence environment. This migration usually involves converting the required unstructured business content into a semi-structured (XML, for example) or structured format. Acquisitions by Business Objects (an SAP Company) of Inxight, and SAS of Teragram, demonstrate that these vendors are moving in a similar direction to IBM.
Using Search in Business Intelligence
The direction of the market is clearly toward the convergence of enterprise search with business intelligence (see Figure 2). Most major BI vendors now offer web-based portals to their products. These portals allow business users to use a search interface to locate and explore reports produced by BI processing, and to find and run canned queries and analyses. Some vendors, such as Progress EasyAsk, for example, also add a natural language search and analysis interface to data warehouse data.
Figure 2: Search and BI Convergence
One of the real benefits of search technology to business intelligence, however, is the ability to use this technology to access unstructured business content and convert it into a format that can be used by standard BI tools. As already mentioned, this is the direction of key vendors such as Business Objects, IBM and SAS. Other BI vendors are likely to follow.
The question I often get asked is, “What benefits does unstructured business content bring to business intelligence?” The main answer to this question is that there is significant business information and value in unstructured business content. Web logs, e-mail and support center reports, for example, enable companies to get valuable insight into customers’ attitudes toward product value and quality. Competitors’ website data is being used in the travel and retail industries to build competitive pricing models. In some cases, business content is being used to supplement data warehouse information. In other situations, analytics are being built by directly querying the source content.
In summary, the convergence of search with business intelligence provides three main options to business users:
- The ability to search and explore business content (including BI reports) and business data using a familiar search metaphor.
- The option to use simpler natural language interfaces to business data.
- The creation of business content intelligence to extend the capabilities of the collective intelligence environment.
It is important, therefore, that organizations track these trends and developments and evaluate their use in the BI environment.