The following is an exerpt on data mining techniques is from Introduction to Data Mining.
Association analysis: Basic concepts and algorithms
Many business enterprises accumulate large quantities of data from their day-to-day operations. For example, huge amounts of customer purchase data are collected daily at the checkout counters of grocery stores. Table 6.1 illustrates an example of such data, commonly known as market basket transactions. Each row in this table corresponds to a transaction, which contains a unique identifier labeled TID and a set of items bought by a given customer. Retailers are interested in analyzing the data to learn about the purchasing behavior of their customers. Such valuable information can be used to support a variety of business-related applications such as marketing promotions, inventory management, and customer relationship management.
Requires Free Membership to View
When you register, you'll receive targeted emails designed to keep you informed of the latest BI, analytics, corporate performance management (CPM) trends and more.
Hannah Smalltree, Editorial Director
|
||||
This chapter presents a methodology known as association analysis, which is useful for
discovering interesting relationships hidden in large data sets. The uncovered relationships can be
represented in the form of association rules or sets of frequent items. For example, the
following rule can be extracted from the data set shown in Table 6.1:
|
||||
The rule suggests that a strong relationship exists between the sale of diapers and beer because many customers who buy diapers also buy beer. Retailers can use this type of rules to help them identify new opportunities for cross-selling their products to the customers.
|
||||
Besides market basket data, association analysis is also applicable to other application domains such as bioinformatics, medical diagnosis, Web mining, and scientific data analysis. In the analysis of Earth science data, for example, the association pattern may reveal interesting connections among the ocean, land, and atmospheric processes. Such information may help Earth scientists develop a better understanding of how the different elements of the Earth system interact with each other. Even though the techniques presented here are generally applicable to a wider variety of data sets, for illustrative purposes, our discussion will focus mainly on market basket data.
|
||||
There are two key issues that need to be addressed when applying association analysis to market basket data. First, discovering patterns from a large transaction data set can be computationally expensive. Second, some of the discovered patterns are potentially spurious because they may happen simply by chance. The remainder of this chapter is organized around these two issues. The first part of the chapter is devoted to explaining the basic concepts of association analysis and the algorithms used to efficiently mine such patterns. The second part of the chapter deals with the issue of evaluating the discovered patterns in order to prevent a generation of spurious results.
- Continue reading about association analysis and data mining techniques in Introduction to data mining
- Read more excerpts from data management books in the Chapter Download Library.
This was first published in February 2006
Business Intelligence Strategies for the CIO