1. Field of the Invention
This invention relates to an electronic data processing method for mining an at least partially optimized association rule from a data set, and particularly for mining from the data set a plurality of separate (disjoint) such rules containing uninstantiated numeric attributes.
2. Discussion of the Related Art
Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for discovering previously unappreciated aspects of the data, or, in other terms, determining correlations between attributes of a relation. Association rules are said to be optimized when focusing on characteristics that are, in some respect, the most interesting for a particular application. Such association rules have been applied in marketing, financing, and retailing, for example.
Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that the support, confidence, or gain of the rule is maximized. Instantiated attributes are attributes represented by certain stated elements of the data points in the data set and may be either numerical, such as a time period, or categorical, such as a city name. Uninstantiated attributes are attributes represented by numerical variables derivable from stated numerical elements of the data points in the data set, such as time periods that are variable combinations of the stated time periods. Support, confidence, and gain will be defined hereinafter.
The volume of critical business data stored in databases is expected to grow considerably in the near future. Many organizations have become unable to collect valuable insights from their own data, partly because most of the information is stored implicitly in the large amount of data. As a result, there arises a demand for new tools to enable business organizations to gather such valuable insights.
Association rules provide a useful mechanism for discovering correlations among the underlying data. In its most general form, as association rule can be viewed as a combination or conjunction of conditions employing the data in the database and satisfying user-specified minimum support and minimum confidence constraints. A large number of different schemes for providing such association rules have been proposed. Association rules that are termed "optimized" are useful for unraveling ranges for numeric attributes where certain trends or correlations are strong. A typical limitation of such previously proposed optimized association rules is that only a single optimum interval for a single numeric attribute can be determined. Such a solution may miss some very significant or very interesting local trends in the data. For many applications, it is desirable to find a plurality of different (disjoint) optimum intervals in the data and to find them as efficiently as possible.