Recent advances in data collection and storage technology have made it possible for many companies to keep large amounts of data relating to their business online. At the same time, low cost computing power has also made enhanced automatic analysis of these data feasible. This activity is commonly referred to as data mining.
One major application domain of data mining is in the analysis of transactional data. In this application database system records include information about user transactions, where each transaction is a collection of items. In this setting, association rules capture inter-relationships between various items. An association rule captures the notion of a set of items occurring together in transactions. For example, in a database maintained by a supermarket, an association rule might be of the form "beer.fwdarw.chips (3%, 87%)," which means that 3% of all database transactions contain the items beer and chips, and 87% of the transactions that have the item "beer" also have the item "chips" in them. The two percentage parameters above are commonly referred to as "support" and "confidence," respectively. Typically, the data mining process is controlled by a user who sets minimum thresholds for the support and confidence parameters. The user might also impose other restrictions, such as restricting the search space of items, in order to guide the data mining process.
Following the early work in Agrawal, R., T. Imielinski and A. Swami "Mining Association Rules between Sets of Items in Large Databases," Proc. 1993 ACM SIGMOD Intl. Conf. on Management of Data, pp. 207-216, Wash., D.C., May 1993, association rules have been extensively studied. (The last-cited paper will be referred to in the sequel as "Agrawal, et al 93.") However, this work treats data as one large segment, with no attention paid to segmenting data over different time intervals. For example, returning to our previous example, it may be the case that beer and chips are sold together primarily between 6 PM and, 9 PM. Therefore, if we segment the data over the intervals 7 AM-6 PM and 6 PM-9 PM, we may find that the support for the beer and chips rule jumps to 50%.
Prior data mining systems and methods have failed to provide for identifying, analyzing and reporting time-dependent associated data in an efficient, readily usable manner.