This article originally appeared on the BeyeNETWORK.
One of the most important decisions to make when deciding on a business intelligencesolution is what type of data integration you will use. The answer to that question is complex and multi-faceted. It is dependent upon the type of solution you are implementing, the knowledge of your in-house IT department and the amount you are able to spend on a data integration solution.
To begin, there are three basic types of integration architecture: Enterprise Application Integration (EAI), Enterprise Information Integration (EII) and Extract, Transform and Load (ETL). This article aims to define these architectures and the appropriate usage of each.
Enterprise Application Integration
EAI is used to tie applications together, either within a company or between companies. The main architectures used in EAI are point-to-point and publish/subscribe. Point-to-point means tying two applications together and publish/subscribe integration involves one application publishing the data and other applications subscribing to the data. In the publish/subscribe architecture, neither application is aware of the other.
If, for example, you need to get the data from an order entry system to a separate shipping department system, or if you are receiving orders from an external company and need to incorporate it in your own system, you require EAI. While this can be accomplished using various technologies, perhaps the easiest and quickest to implement is web integration. An example would be an enterprise portal. The enterprise portal would contain triggers that are fired when a user pushes the correct buttons.
A much more difficult method is to use messaging. This requires modifications to the applications themselves and usually results in XML messages being generated. The receiving application must read the XML. Messaging can be synchronous or asynchronous, which is basically the same thing as point-to-point vs. publish/subscribe. Message level integration is more responsive and may make sense if you need a fast response time. Messages are interesting in that they are fired based on business logic. This requires cataloging and defining business events.
How do you define a requirement for application integration? An indicator is manually entering data in a system which exists in another system. Again, as an example, assume an order was received in an order entry system in a retail store. If a record has to be manually entered into an inventory system to remove the products which are being shipped to the customer, or if a record has to be entered into the shipping program to send the items out, these systems are not integrated and a prime opportunity arises for integration. Even if the data is downloaded from the order entry system in a batch and later uploaded to the inventory system, if it requires manual intervention it would be a candidate for integration. If data does not flow automatically through the entire chain, then application integration is required. The more time spent on manual entry, the more critical the need for EAI.
Another obvious need for application integration is the business-to-business (B2B) integration. When receiving products from a vendor, it is rare that a company will use the same layout in their system that the vendor used. Therefore, data integration is needed to get the information from the vendor into the inventory system.
Enterprise Information Integration
The second integration architecture is EII, or Enterprise Information Integration. This type of integration pulls data from multiple sources. EII is used when real-time answers are required. Rather than pulling the data from source systems and loading it into another system, as is done in data warehousing, the data is left in the source systems. The integration tool creates a sort of virtual database. This process is also known as data federation. Data is mapped from the source systems using both business and technical logic. Source systems can include not only the operational systems, but data warehouses and data marts, as well.
The trade off in developing EII solutions rather than data warehouses is the reason why data warehouses were originally developed. Advanced queries against the operational systems can cause system degradation and seriously intrude on operations. As well, complicated queries can take several hours. Even in a data warehouse, a complex query can take hours. Imagine what that would do to your operational system. In a transactional system, it simply is not possible to run complex queries. Nevertheless, there is a need for real-time reports, and an EII solution will probably be necessary, even if you do have data warehouses or data marts. These two technologies compliment one another. EII does require a tool, as the whole point is to pull information from systems that have data in different formats. If the tool is flexible, new systems can be added without difficulty, and existing systems can be modified as necessary.
An EII solution is indicated when there are several different source systems containing important operational data. In order for management to make day-to-day decisions, this data must be integrated and the integration solution must deliver real-time results.
Extract, Transform and Load
The third and final type of integration is ETL, Extract, Transform and Load. Within the extract and transform is the data quality or cleanse process, which is used to remove bad or unnecessary data. ETL is used to pull data from operational systems to load into a data warehouse, data mart or operational data store. The data is extracted from the operational systems, transformed into the proper structure and loaded into the new database. The transformation process should include a metadata repository which keeps a map of the transformations. Care should be taken to document both the technical and business logic involved in transforming the data.
If the data mart to be built is small, or the transforms are not too complex, custom database scripting or even shell scripting may be used to build the ETL. Learning an ETL tool requires time and expense which may not be appropriate for a small application. If the data is going into an operational data store with little or no transformation, there again, an ETL tool is probably unnecessary. A combination of database triggers, stored procedures and shell scripting may work perfectly fine, so long as the process is well documented and automated.
However, if there is sufficient complexity or a large process, it makes sense to use an ETL tool. If the in-house IT staff is small, not knowledgeable in scripting or if turnover is high, an ETL tool is also a wise investment. Managing a graphical interface is much easier than managing custom scripts. Trying to follow the data flow in many separate script files is extremely frustrating and time consuming. Documentation is intrinsic in the tool, which is a boon to all developers.
The data warehouse or data mart is usually loaded via batches on a weekly or monthly schedule. This data is most often used for historical analyses, where the federated data system is used for real-time analyses. Because the data warehouse holds historical data, load times may lengthen as the data warehouse grows. Technologies are constantly evolving to shorten the load cycle to offset the longer processing time.
There is a need for every type of integration architecture in most organizations. No matter which form of integration you use, you will find that it exposes the need for other types of integration. A good starting point is to analyze your business workflow to highlight your weak areas; then decide which point is the major bottleneck. While working on this area of integration, keep the other integration points in mind. What you do in one area may impact what you will need to do in another.
Build in as much flexibility as possible. Avoid point-to-point architectures where publish/subscribe architectures will work. As with any software development, reusability is a key to quicker reiteration of the process. Avoid, as much as possible, hard coding or hand scripting. The time spent in training on new tools will be more than offset in the time saved in maintenance. Finally, follow a development methodology. Lay out the requirements and scope of the project. All parties should agree on the business process flow, the meaning of the data and the scope and requirements of the project.