This article originally appeared on the BeyeNETWORK.
We hear a lot of talk today about “enterprise” business intelligence and rightfully so. In the last year or so, we have seen an enormous increase in demands put upon business intelligence. This is forcing organizations to rethink their old strategies of silo or departmental business intelligence. To name just a few of these demands:
- Compliance — whether it is for Sarbanes-Oxley, HIPAA, Basel II, or other legislated requirements, the emphasis is definitely on the enterprise view.
- Customers — forever has the entire customer relationship been changed due to demanding customers, CRM, and the enterprise’s emphasis shifting from a product focus to a customer one.
- Massive amounts of stored data—we have the ability and the demand now to make accessible every single ATM transaction, every use of a credit card, every movement of an RFID-tagged piece of inventory, all claims and their various stages available and accessible to everyone in an organization for analysis and study.
- Internet traffic — clickstream analysis has allowed marketers and sales personnel to “see” a customer’s every move, study their purchasing behaviors in ways never imagined before.
The ability to get a complete understanding of a compliance issue, a customer’s situation, the inventory trends and movements, or even how to quickly change a website configuration based on perceived behaviors, means that everyone must have access to consistent, reliable and easy to use information. That information must be fully integrated across the enterprise and be presented in an efficient and effective manner. What is needed is a new level of sophistication in both the conceptual architecture and the underlying technology supporting such an enterprise.
It also requires a level of freshness or currency to the data in addition to the traditional historical snapshots. This may be difficult for many traditional technologies to accommodate. The buzz phrase today is the “real-time enterprise." What exactly does that mean? Does every piece of data have to be “real time” in its currency? The answer is a resounding “No." Some data does have to be as current as we can technologically make it. However, the vast majority of data in our enterprise business intelligence environment does not. It can be hours, days, weeks, even months, old and still give us appropriate insight into the enterprise’s workings. The real task is to match the appropriate timeliness of data to the actual business requirement at hand.
For example, a stock trader must have up-to-the-second data on stock prices to make a trade. A credit card validation is not real time — it may take several seconds to even a minute before the authorization code is received. The fulfillment of an order may not happen until late in the day or even the next day, certainly this is not real time. The following figure shows that all data falls somewhere along a time continuum from real time at one end to batch at the other. Analytics and other business intelligence applications may use data from all sections of this continuum.
Figure 1: The Time Continuum of Data
Where is the technology today to support these new and vastly different forms of enterprise business intelligence data? Can we continue to use traditional relational databases and hardware platforms? Will they be able to handle the new demands efficiently? Perhaps not. These technologies are optimized for a mixed work load. They started out life as a transaction processor (OLTP) – that is, handling one transaction at a time very quickly and efficiently. Our order entry, billing, general ledger, and other traditional operational systems are run on these technologies. Over the years, they have added capabilities to these databases to increase their abilities to handle analytic processes (OLAP) as well. The ability to return multiple records based on a single query allows them to do routine forms of business intelligence.
But many queries are no longer routine. Many are vastly more complicated, requiring multiple joins across massive tables causing performance issues unheard of before now. These more complex forms of analytics put a huge strain on these traditional database architectures, especially as data grows into the tens and even hundreds of terabytes.
Let’s use an analogy to help understand why these technologies may have problems. Suppose you have a huge pile of dirt (data) in which there are small flakes of gold (significant findings or events). Your job is to separate the gold from the dirt and move it to a new location where it can be converted to gold bars (trends, patterns, cause and effect models, results of complex queries, etc.). How do you do this?
Solving this problem with a traditional RDBMS is similar to having a fleet of expensive dump trucks that cannot differentiate dirt from gold. They can only move the material to the sorting station (CPU) where the gold is separated from the dirt slowly as shown in Figure 2. There is no mechanism to lighten the load on the trucks, nor scalability to handle larger piles of dirt. This can only be done by adding more trucks. Every load is treated the same way and bottlenecks then occur at the sorting station unless you add more sorting stations. In addition, because there are so many trucks needed, the highway infrastructure (networks) must be large enough to support all the traffic.
Figure 2: Traditional Relational DBMS architectures
What is needed is a new paradigm. Ideally you would like to have some kind of filtering process within the processing units to remove most of the unwanted data. Then you would process only the data that is of interest rather than sorting through everything. And only the relevant data would be returned to the person issuing the request. This is the idea behind the data warehouse appliance, pioneered by Netezza.
Let’s go back to the analogy to see how this would work. If we could put some form of sieve or strainer between the pile of dirt and the truck hauling it, we could begin the sorting process earlier by separating the unusable matter from the desired gold flakes at the source. By bringing over only gold, and no extra dirt, we have lightened the load that must be carted to the processing or sorting station. This does a number of good things for us. First, because we are not dragging massive amount of heavy dirt back and forth, we can use lightweight and less expensive pickups to do the hauling. Second, lighter pickups with smaller loads mean that the supporting infrastructure, or the roads, don’t get as much wear and tear and can actually be smaller as well. See Figure 3 for this analogy.
Figure 3: A Different Paradigm
What this means to a technologist is that by using this new paradigm, you can process complex queries faster and with less expense and not burden the servers or networks as much as you would with traditional relational technologies. It also means that the business user gets the right data in a timely fashion. This is the architecture for data warehouse appliances. The technology offered by Netezza shakes up the traditional paradigm by creating just such a sorting or filtering process at query time through the use of intelligent storage blades, combining CPUs and programmable logic with the disk drives where the data is stored. Netezza’s intelligent blades greatly reduce the load on the entire data warehousing environment.
Hopefully you can see the importance of considering this new paradigm. Today, business intelligence must serve the entire enterprise — not just a subsection of it. This adds a tremendous increase in complexity – data, queries, and the infrastructure to support these. It also means a much larger amount of data will be needed. Add on the complexity of a right time environment and you can see that a new way of constructing your business intelligence environment is needed. Yet, the environment must still be “easy to use” and efficient for the users and maintainable for the IT staff.
We must be able to store all data, not just subsets or summaries, to support right time enterprise business intelligence. The increased amount of data must be used to satisfy all needs (regulatory, marketing, supply chain, etc.) in an efficient and effective manner. Response time, efficient loading times, and low total cost of ownership are critical factors in creating this environment.
Traditional database technologies may be able to get you started but they also serve two masters. This may cause them to have a difficult time satisfying both optimally. In particular, the volumes and complexities of the queries today strain the abilities of these technologies to their limits. Perhaps it is time for a different paradigm from the traditional one. For enterprise BI to truly be possible, you need an environment with no constraints — in terms of storage, performance, number of users, complexity of query or other parameters. You also need one that does not break your wallet or your administration resources. This may require you to think “outside the box.” The data warehouse appliance may just be the ticket to help you over this hurdle.