This article originally appeared on the BeyeNETWORK
Historically, data warehouse efforts have been pursued by Fortune 1000 companies. These are large organizations with vast resources and with leading-edge technical savvy at the ready. The literature is ripe with articles about companies that have undertaken Data Warehouse efforts with 7 and 8-digit budgets, and with project timelines spanning several years.
But it doesn’t have to be that way.
Medium-sized companies ($50 to $500 million in revenue) have the same information needs as large organizations, but have fewer resources available to them, both human and financial. The development of high-performance, yet low-cost servers, and the addition of Data Warehouse features within mass-market Relational Data Base Management Systems (RDBMS) have brought sophisticated Data Warehouse hardware and software within the range of companies with only modest available resources. In addition, advances in architecture and more mature development approaches have reduced the time required before the company begins to realize tangible benefits from having a Data Warehouse. Therefore, the revenue increases from the “low hanging fruit” can help fund follow-on iterations.
What is a Data Warehouse?
A data warehouse is an environment that contains the key data that executives and managers within an organization use to support their decision making. This environment is generally located on a box that is physically separate from the operational computing resource. It contains data that has been cleansed, validated for logical consistency and transformed into structures that are easy for non-technical managers to use. It contains data in an English, business vocabulary format. And the data is stored in a special way. While operational applications use highly normalized data structures that non-technical individuals often find difficult to navigate, the user view of data in a data warehouse is frequently stored in a dimensional structure. Pertinent facts about the company and its products, supply chain and customers are displayed and summarized easily by the categories (dimensions) that the executives most commonly use in their personal analysis efforts.
Why is a Data Warehouse needed?
Most data warehouse projects have been sponsored and initiated by IT to achieve the benefits that accrue to an organization from a technical perspective. However, the most successful and effective implementations of data warehouse technology have been sponsored by business unit executives to achieve solid business objectives.
The IT Perspective
Without a data warehouse, all reporting and business analysis functions must be supported on the operational computing resource of the company. However, operational transaction processing, periodic reporting functions and ad hoc business analysis are three very different applications, which place very different demands on a computing environment.
Operational transaction processing is the lifeblood of the company. OLTP applications are typically optimized for write performance – hence the use of highly normalized database structures. Transaction processing workload is often predictable across the workday and across the calendar year. Immediate response to a transaction is pivotal to operational efficiency for the business. Availability of the OLTP resource is extremely important. Consequently, automated failover capabilities are common.
Periodic reporting functions typically follow a predictable schedule. Certain standard reports are run overnight. Others are run at month end, quarter end, and fiscal year end. Because it is common for standard reports to perform vast amounts of I/O, report requests are typically queued and dispatched so that only a few of them run at one time. Service levels are typically defined within delivery windows that span a few hours.
But, ad hoc business analysis is unpredictable by its very nature. It can be done at any time of the day or night. It is also fraught with risk. The process of creating and launching analysis queries is prone to generation of full table scans – not the kind of thing that you want to do against the operational system database directly. Yet it is highly important to the business. When executives and managers are performing business analysis, they are learning WHY things have happened. They are also learning new facts about their company, customers, markets and suppliers. Additionally, they are discovering new correlations between these entities. Business analysis affects the strategy that is employed at the highest levels of the organization.
If there is no data warehouse – if all three types of processing are supported in a single processing environment – either one or more of the types of processing is poorly served, or the processing resource is over-sized for the organization.
For a medium-sized company, the IT benefits of data warehousing relate to their ability to deliver an analysis capability while maintaining an efficient operational environment. Establishing a data warehouse:
- Protects the operational transaction processing computing resource from unpredictable end-user reporting and analysis demand.
- Allows IT to defer upgrades of high-cost OLTP processing resources by reducing workload on the “production” resource.
- Provides a clean, QA’d and validated source of logically consistent business performance data, providing a single version of “the truth.”
- Provides a single place for developers to go to get the data needed to fulfill end-user requests.
- Allows IT to respond more quickly to reporting requests from business units.
- Potentially moves query and analysis processing cycles to a less expensive platform.
The Business Perspective
Technical staff frequently fails to understand the difficulty that non-technical business unit end-users have in understanding the highly normalized, entity-relationship structures that are typical of operational systems. When IT attempts to deliver end-user reporting capabilities based on these structures, it is common for only one or two users to “get it.” The rest of the users learn who those few people are.
However, the dimensional structures that are created in the data mart portion of the data warehouse are inherently easier for the non-technical user to understand. That is because the dimensions structure the data the way the business user thinks. This improvement in ease-of-use means that executives and managers get more data in a timely manner. This equates to greater over-all organizational effectiveness:
- We make better business decisions by having better information that is more readily available.
- We create an environment to exploit the company’s digital assets: Identify trends easier; find out-of-pattern events more readily; and seek the nuggets of information that will lead to business insight.
- We provide a clean, QA’d, and validated source of logically consistent business performance data, providing a single version of “the truth.” Everyone in the organization agrees on a single, consistent set of numbers.
How to Develop an Affordable Data Warehouse
Two factors affect the ability of a medium-sized company to deploy a data warehouse quickly and cost effectively. First, the cost of the infrastructure needed to support a data warehouse has declined dramatically in recent years. Second, through years of training and discipline, we have learned to follow iterative methodologies that deliver tangible benefits to our clients early and often.
The Infrastructure Factor
Prices on commodity servers have declined significantly, while at the same time, their raw processing power, I/O bandwidth, disk storage and memory capacity have increased. Similarly, in recent years, mass-market relational database management systems (RDBMSs) have added data warehousing features and capabilities at no added cost to the client. This combination of increased capability with reduced costs allows the medium-sized company to establish an effective data warehouse infrastructure without breaking the bank.
The Methodology Lesson
Whether we are building an enterprise data warehouse or an architected data mart solution, the lesson has been learned that an iterative process with frequent deliverables works best. Typically, each iteration will take about 90 days and it ends with the implementation of end-user functionality that has true value for the business.
The first area chosen for development is normally the area that management believes will have the greatest return for the organization. In practice, it often represents an area that is currently causing the greatest amount of pain for executive management.
When this iteration is delivered, the business immediately begins to realize the benefits of the data warehouse. Then the next iteration is conducted, and the next. Often, the benefits that accrue to the initial work efforts provide quantifiable savings and increased revenue to fund subsequent iterations.
Developing a data warehouse should be within the scope of any medium-sized company’s IT organization. It makes the business more effective by improving the executive decision-making environment. And it can lower operational business and IT costs by off-loading the I/O and processing demand from operational computing resources onto a lower-cost processing environment. The company achieves improved management effectiveness and operational efficiency at the same time.
Factors that had previously posed significant barriers to development of a data warehouse and business intelligence capability are no longer at work.
Can a medium-sized company afford a data warehouse? Absolutely. The benefits to be gained combined with the reduced cost of the infrastructure and the short time horizon for tangible pay-back make the argument compelling.