This article originally appeared on the BeyeNETWORK.
When describing a set of systems as a “stack,” one assumes that the components of the stack are interdependent, interoperable and modular. When it comes to business intelligence (BI) stacks, however, the waters become murkier – but looking at business intelligence applications as solution stacks can simplify managing projects across the enterprise as well as highlight areas where open source (or commercial) software could improve the bottom line.
When the goal of business intelligence is to enable end users to use any and all corporate data to create actionable knowledge and generate competitive advantage, theoretically anysoftware used within the enterprise will become business intelligence software. Consider that business intelligence applications draw data from corporate systems ranging from accounting to human resources to sales to supply chain management. That data may be used in a multitude of packaged and custom end-user applications associated with business intelligence, such as dashboard, business projection, analytics, custom reporting – but also incorporated into generic office productivity applications such as word processing, presentations and spreadsheets.
Oddly, it is easier to view business intelligence as “architecture” because the solutions seem so complex. BI systems are inextricably tied in with so many of an enterprise's systems, and so many of the components interact across so many boundaries.
Rather than attempting to design an all-encompassing BI architecture, it is useful to consider business intelligence as a collection of application stacks. Understanding how interdependence, interoperability, and modularity help define these BI software stacks can simplify the process of incorporating open source components into your BI stack, while at the same time reducing the overall cost of managing all enterprise BI components, whether open source or proprietary.
What is a Stack?
“Stack” means many things in IT (see http://en.wikipedia.org/wiki/Stack). A protocol stack defines a sequence of data formats and permitted transformations for manipulating data and moving it through a system. TCP/IP network protocol stack implementations consist of a sequence of software components designed to allow data to pass from hardware interfaces transmitted as a physical signal, through several layers of abstraction to an end user, and then back down again.
A network stack built to standard protocols implies interoperability; you should be able to mix and match software from different source at the different layers (seehttp://en.wikipedia.org/wiki/Tcp/ip#ayers_in_the_Internet_protocol_suite_stack for an explanation and illustration of the TCP/IP stack). For example, before Microsoft started integrating TCP/IP into Windows, IP networking with Windows required software from your network interface card vendor along with TCP/IP from a third-party vendor and application software from yet another vendor.
Unlike network protocol stacks, BI stacks are considered “solution stacks” or “software stacks.”These systems may or may not operate on data in a predefined sequence of formats and layers of abstraction. The best-known software stack is the open source standard LAMP (Linux, Apache, MySQL and PHP/Perl/Python) Web application platform (check out the LAMPware community site ).
Just as TCP/IP protocols are often diagrammed using boxes labeled with protocol acronyms stacked on top of each other, solution stacks such as LAMP can be diagrammed with boxes labeled with software packages stacked on each other.
Likewise, LAMP component systems interact with each in predictable ways and at predefined abstracted layers. At the top are the applications and/or scripts developed with Perl/PHP/Python; these interoperate with the servers – Apache Web server and MySQL database server – which in turn interoperate with data stored on a system running the Linux OS.
Finally, the LAMP platform is modular. MySQL could be replaced with Oracle, Linux could be replaced by Windows, BSD, Apple MacOS or any other OS. The LAMP platform can be modified either by plugging and playing with one or more of the four basic components or adding additional components on top of the stack.
Layers of Abstraction and BI Architectures
Stacks such as LAMP or TCP/IP are used to illustrate the most direct network path that allows end users to interact with physical data. The lowest level possible for a software stack is an interface to a hardware source of data; the highest level possible for a software stack is the user interface.
The layers of abstraction are reasonably clear in these two cases. Internet users (people) interact with a Web browser, the Web browser accepts data from the user and formats it for use by a process running on the user's computer, and the process repackages the data so it can be sent over the Internet. The final step is to convert IP packets for transmission down a wired or wireless network interface.
On a LAMP system, the “P” elements are application programs that provide an interface for end users to access data on the system. Those programs and/or scripts interact with the database and Web servers, parsing user input into a format the servers can understand. Finally, the servers interact with the OS, Linux, to access the data stored physically on the system hardware.
The problem for BI is that it isn't really a single “thing” that you can define a single stack for, but rather an architecture potentially full of compatibility issues, system requirements and custom software to integrate.
BI stacks are different because the “shortest path” through the network of dependencies (that is, system requirements for underlying systems/platforms, compatibility, sources of input and output) is not always obvious. There may be many different sources of data feeding the BI stack, so one layer of abstraction may be needed to turn all the data from all required sources into a usable format as a relational database; more layers may be needed to aggregate data when it is drawn from more different sources. One or more layers may be needed to construct OLAP cubes for analytics or other applications. A security/authorization layer may be needed to permit access to subsets of the data to properly authorized users.
Microsoft has referred to its business intelligence solution as a stack: at the bottom is Microsoft Windows, on which Microsoft SQL Server runs, feeding data into applications including the upcoming Microsoft Office PerformancePoint Server 2007. Still in beta, it is described as “a complete and integrated performance management application that includes business scorecarding, analytics and planning.” The stack may also incorporate Microsoft subsidiary ProClarity's analytics platform, for query and analysis, scorecards, dashboards, reporting and development. Of course, other products from other vendors can be used, as long as they can interoperate with the Microsoft BI stack.
Microsoft BI offers a nicely integrated approach to business intelligence that may be the perfect solution, particularly for SMBs (small- to mid-sized businesses) that already buy all their OS, database and application software from Microsoft. However, using this approach locks you into a single vendor and gives that vendor carte blanche to a large portion of your IT budget.
BI Solution Stacks
Business intelligence isn't a system, or even a system of systems – it is a consumer product, provided to your enterprise customers. Consumers have choices: they can bake a cake from scratch, use a mix, buy an inexpensive cake from the grocer or buy a fancy cake from an elite bakery. Each of those options implies a different set of procedures, different suppliers, different ingredients and different target markets.
Rather than approaching BI as architecture, you could define BI solution stacks based on your “products.” Thus, a BI reporting stack starts at the bottom with an OS and database server, middleware necessary to get data to the BI reporting software, and the report writing application on top.
You could also abstract lower layers into a simplified “cloud,” representing, for example, all the various data sources you have no control over. You'll need an interface to that cloud, but you can simplify your BI solution stacks by defining only the layers and components you need with a data-collection/interface component at the bottom.
When you look at the flow charts of vendors providing “stacks,” it becomes more obvious that they are really architectures and/or networks of applications. You simply take the top-layer applications and track the links from each one down the “stack” to see what the different components of that application stack really are.
Interdependence, Interoperability, Modularity and Vendor Neutrality
BI stacks, like other application stacks, require that you consider each component to be interdependent, interoperable and modular; you may want to have those components be vendor-neutral, but that's not necessarily a requirement.
Interdependence means that each component depends on a lower layer component to operate and provides some necessary service to a component on the next higher layer.
Interoperable means that the components at connected layers are able to talk directly to each other (e.g., there is no special middleware necessary to get MySQL to access and store data on a Linux box).
Vendor neutrality means that you can shop for best of class solutions at every layer. For example, you may prefer Windows as your OS. In that case, you could build a WAMP platform: Apache, MySQL and Perl/PHP/Python running on a Windows box.
Open Source, Rationalization and Vendor Consolidation
Once you're able to isolate the different BI stacks that together form your BI architecture (along with the possible options you've got for swapping components in and out), the process of rationalization of all those systems becomes much easier.
You can analyze your BI stacks in terms of product (do they meet or exceed expectations of the consumer of output?), performance (do bottlenecks exist at some layer?) and cost of ownership (can any of the proprietary or commercial components you use be replaced by non-proprietary or no-cost open source components?).
You can also use this information to guide decisions about vendor consolidation. Whereas it is to the vendor's benefit to lock you into an entirely proprietary stack, open source vendors may prove more open to interoperability – and prove to be better suited to your long-term BI plans.