The increased popularity of electronic commerce, and progress in technologies such as bar-codes have made it possible to automatically store sales data. For example, electronic transactions over the Internet allow retail organizations, banks, and governments, for example, to store and maintain information related to customer behavior. Retail organizations, for example, may wish to store information concerning customers buying behavior. In order to better serve customers, it may be desirable to target marketing information. In other words, it may be desirable to deduce, from stored information of past customer behavior, what type of information may interest which customers. Information regarding one customer may, for example, be deduced on the basis of the behavior of other customers identified as "similar" or "peers".
In the case of retail organizations, for example, transaction records may take the form of market basket transactions. Each market basket transaction may include an item or set of items (data value(s)) which may be bought together by a customer. For example, transaction records of a supermarket may be
{Milk, Bread, Butter}, and PA1 {Pepsi, Diet Coke, Sprite}. PA1 Customer A&lt;={Bed_sheet, Pillow, Comforter, Pillowcase} PA1 Customer B&lt;={Pillow, Comforter, Pillowcase, Nightstand}. It seems likely that customer A and customer B have correlated behavior. Thus, it may be possible to make personalized recommendations to customer A based of the behavior of customer B, and vice versa. For example, buying a "Nightstand" may be recommended to customer A, or buying a "Bed_sheet" may be recommended to customer B. Deducing the preferences of a given user by examining information about the preferences of other similar users is often referred to as collaborative filtering.
It is logical to expect that market transactions of a particular customer be strongly correlated with each other. For example, it is likely that the items in an individual customers' weekly shopping list are correlated m from week to week. Similarly, it is possible that two individual customers exhibit correlation in their buying patterns. Suppose, for example, that customer A and customer B are each associated with the following transactions:
Collaborative filtering is possible when customers, such as customers A and B above, for example, are determined to be "similar". "Similarity" between customers may be judged based on records in a transaction database and on a similarity criteria or objective function. Specifically, the problem of finding similar, neighboring, or peer records (similarity searching) is that of finding the data record or set of k data records in a database that is/are closest, in the sense of an objective function, to a given target value.
Many methods have been previously described for searching a database. In particular, several similarity search methods have been proposed. For example, White D. A., and Jain R., `Similarity Indexing with the SS-Tree,` Proceedings of the 12th International Conference on Data Engineering, New Orleans, U.S.A., pages 516-523, February, 1996; Roussopoulos N., Kelley S., and Vincent F., `Nearest Neighbor Queries,` Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1995, pages 71-79; Samet H., Design and Analysis of Spatial Datastructures, pp. 135-141, Addison Wesley, 1993. These techniques are known to be efficient for data for which records include numerical data values, and for data for which the number of possible data values associated with each record is relatively small (e.g. 5-8 data values/record). In the case of non-quantitative data values, and a relatively large number of data values per data record (e.g. in the range of thousands data values/record), however, these techniques may not be applicable.