Consider the following problem about customers of a bank: what their floating deposit balances and their age must be so that the number of people having a time deposit balance of two million yen or more is 20% or more of the total number? The floating deposit balance and age are integers which are continuous numeric values, while the time deposit balance of two million yen or more is a classification of not less than two million yen or less than two million yen. The time deposit balance therefore has a true-false attribute. Similarly, a true-false attribute may represent, for instance, a question such as "Does the customer have a credit card?" or "is the customer male?" If such a query can be solved, the bank can easily find out, for instance, who it should send a new product advertisement to in order to efficiently conduct its business.
Conventionally, studies on the fast extraction of a rule (association rule) to represent a correlation between true-false attributes have been made in the field of data mining. These methods are described, for instance, in "Mining Association Rules Between Sets of Items In Large Databases," R. Agrawal et at., Proc. of the ACM SIGMOD Conference on Management of data, May 1993, and in "Fast Algorithms For Mining Association Rules," R. Agrawal et al., Proc. of the 20th VLDB Conference, 1994.
Furthermore, the following conventional techniques for determining rules between two-term numeric data are also known in the art.
1. Techniques of searching for a straight line in a plane that optimally approximates a set of points to find a strong linear correlation: These include, for example, least squares methods, recurrent center methods, etc. A drawback of these methods is that only a linear correlation can be found. Also, their accuracy is low if each data is predicated using the linear correlation and the absolute value of the correlation coefficient is 0.5 or less. Their usefulness is therefore limited.
2. Techniques for finding squares, rectangles, or circles or ellipses containing much data for the area thereof on a two-dimensional plane to find a weak global correlation: An example of these techniques is one using a calculation geometry algorithm, which generally has a long calculation time. For instance, a time more than O(M.sup.3) may be required for circles. O(M.sup.3) means that a calculation time of order M.sup.3 is required, where M represents the number of data. Further, a correlation region to be taken out is limited to only those having a fixed shape. However, there are not many cases in which a proper coverage is provided by a fixed shape.
3. Techniques for dividing a plane into a square mesh, and extracting pixels containing much data: The extracted pixels are often not connected, but exist apart from each other. It is therefore difficult to detect them as rules.
The above approaches also have a drawback of being difficult to distinguish meaningful rules from those that are meaningless. Generally, whether a correlation is practically meaningful often must be determined by a person. In the first two groups, meaningful correlations are easy to overlook since only special correlations can be taken out. In the third group, a person cannot find any rule by looking at the output.
Other methods include one for dividing a plane into a square mesh, and segmenting a region containing much data, which is in connection with pixels and x-monotone. (See, for example, "Data Mining Using Two-dimensional Optimized Association Rules: Scheme, Algorithms, and Visualization," Takeshi Fukuda et al., Proc. of the ACM SIGMOD Conference on Management of Data, pages 13-23, June 1996). The x-monotone means that it is convex in the column direction, but not in the row direction. Although this method can take out a correlation having a certain meaning at high speed, it often segments a complicated region which dramatically varies in the vertical direction, and thus it is difficult for a person to grasp by looking which portion has a strong correlation. Further, there is a disadvantage that the shape of a segmented region largely depends on how square meshes are given (i.e., how data is distributed to each pixel), because the region is x-monotone.