Manage Learn to apply best practices and optimize your operations.

Introduction to data mining: Association analysis

This excerpt from Introduction to Data Mining offers a crash course on association analysis -- an effective data mining technique.

The following is an exerpt on data mining techniques is from Introduction to Data Mining.

Association analysis: Basic concepts and algorithms

Introduction to data mining techniques

Many business enterprises accumulate large quantities of data from their day-to-day operations. For example, huge amounts of customer purchase data are collected daily at the checkout counters of grocery stores. Table 6.1 illustrates an example of such data, commonly known as market basket transactions. Each row in this table corresponds to a transaction, which contains a unique identifier labeled TID and a set of items bought by a given customer. Retailers are interested in analyzing the data to learn about the purchasing behavior of their customers. Such valuable information can be used to support a variety of business-related applications such as marketing promotions, inventory management, and customer relationship management.

Introduction to Data Mining: Table 6.1. An example of market basket transactions.

 TID -- Items

  1. {Bread, Milk}
  2. {Bread, Diapers, Beer, Eggs}
  3. {Milk, Diapers, Beer, Cola}
  4. {Bread, Milk, Diapers, Beer}
  5. {Bread, Milk, Diapers, Cola}

This chapter presents a methodology known as association analysis, which is useful for discovering interesting relationships hidden in large data sets. The uncovered relationships can be represented in the form of association rules or sets of frequent items. For example, the following rule can be extracted from the data set shown in Table 6.1:

{Diapers}      -->       {Beer}.

The rule suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer. Retailers can use this type of rules to help them identify new opportunities for cross-selling their products to the customers. 

Besides market basket data, association analysis is also applicable to other application domains such as bioinformatics, medical diagnosis, Web mining, and scientific data analysis. In the analysis of Earth science data, for example, the association pattern may reveal interesting connections among the ocean, land, and atmospheric processes. Such information may help Earth scientists develop a better understanding of how the different elements of the Earth system interact with each other. Even though the techniques presented here are generally applicable to a wider variety of data sets, for illustrative purposes, our discussion will focus mainly on market basket data. 

Copyright info

Introduction to Data Mining
By Pang-Ning Tan, Michael Steinbach and Vipin Kumar
ISBN: 0-321-32136-7
Publisher: Addison-Wesley
Copyright: 2006; 769 pages

There are two key issues that need to be addressed when applying association analysis to market basket data. First, discovering patterns from a large transaction data set can be computationally expensive. Second, some of the discovered patterns are potentially spurious because they may happen simply by chance. The remainder of this chapter is organized around these two issues. The first part of the chapter is devoted to explaining the basic concepts of association analysis and the algorithms used to efficiently mine such patterns. The second part of the chapter deals with the issue of evaluating the discovered patterns in order to prevent a generation of spurious results.

Dig Deeper on Business intelligence data mining