1. Field of the Invention
This invention relates to a (relational) database system. Firstly, this invention relates to self-join operation processing for extracting a record satisfying a condition specified in a query from a plurality of records in a same table. Secondly, this invention relates to data mining for finding a relation among data stored in a database by the self-join operation.
2. Description of the Related Art
As a processing method of a join operation in the relational database system, xe2x80x9cNest Loop Processing Method,xe2x80x9d xe2x80x9cSort Merge Processing Method,xe2x80x9d xe2x80x9cHash Processing Method,xe2x80x9d etc. are known.
In the xe2x80x9cNest Loop Processing Method,xe2x80x9d while each part of records in a table is stored in a buffer in a memory, all records in another table are read. Then, an output record is produced by combining records which satisfy a join condition.
Since one of the tables is read repeatedly, the xe2x80x9cNest Loop Processing Methodxe2x80x9d is not efficient. If it is possible to read only records satisfying the join condition by using an index, efficiency of the xe2x80x9cNest Loop Processing Methodxe2x80x9d can be improved.
The xe2x80x9cNest Loop Processing Methodxe2x80x9d can be applied to any join condition. However, most of the join operations are equi-join operations with a join condition of equal key values. For the equi-join operations, the processing can become efficient by restricting a retrieval range of the join records by using the xe2x80x9cSort Merge Processing methodxe2x80x9d and xe2x80x9cHash Processing Method.xe2x80x9d
In the xe2x80x9cSort Merge Processing Method,xe2x80x9d record groups in both of the tables are sorted by a join key at first. Accordingly, it becomes possible to access records with an equal key value continuously. Further, a changing point of the key value is found and identified as a retrieval range boundary.
In the xe2x80x9cHash Processing Method,xe2x80x9d the record groups in both of the tables are classified into a plurality of groups according to a value obtained by applying a hash function to the join key, and the processing are divided into join operations between corresponding groups. There is a possibility that there are records with various join key values in each of the groups, however the possibility can be reduced by increasing a number of the groups. Operations of the hash function is simpler than sorting. Therefore, the xe2x80x9cHash Processing Methodxe2x80x9d has an advantage when the performance of the CPU (Central Processing Unit) is limited.
In the xe2x80x9cSort Merge Processing Methodxe2x80x9d and xe2x80x9cHash Processing Method,xe2x80x9d an input table is preprocessed, and a processing result is written in a storing unit as intermediate data. Therefore, in self-join processing where two input tables are same and join keys are same, two intermediate data with a same content are produced physically.
Accordingly, a load of processing in the self-join operation in a same table is as heavy as a load of processing in the join operation between different tables.
An association rule mining (basket analysis) is a kind of data mining, i.e., analyzing mass data statistically and finding useful rules and knowledge. The association rule mining is utilized for increasing sales through a merchandise display in a store, set sales, etc. by finding a trend in combinations of purchased items from a consumer purchase behavior history.
The association rule mining includes a phase of extracting all combinations of items (frequent itemset), which appear a determined number of times or more, from purchase history data stored in the database system and a phase of finding an association rule by considering inclusion relations among the extracted combinations. Particularly, as known, a load of processing in the former phase is heavy.
As a processing method in the phase of extracting, Apriori Algorithms proposed by R. Agrawal, et. al. are well known.
The processing method described in xe2x80x9cFast Algorithms for Mining Association Rulesxe2x80x9d in proceedings of the 20th VLDB (Very Large Data Bases) Conference, pages 487-499, 1994 is as follows.
An appearance number of each purchased item is counted in the purchase history data, and items of which appearance number reaches a certain value are extracted as frequent items.
Two different frequent items are combined as a candidate 2-itemset (xe2x80x9c2xe2x80x9d in the xe2x80x9ccandidate 2-itemsetxe2x80x9d shows a number of combined items).
An appearance number of the candidate 2-itemset in the purchase history data is counted, and the candidate 2-itemset of which appearance number reaches a certain value is extracted as a frequent 2-itemset.
If k greater than =3, following steps are repeated.
Two itemsets including one different item are chosen from frequent (kxe2x88x921)-itemsets, and an itemset of k items including each item from the both sets, i.e., candidate k-itemset, is generated.
However, an itemset including a combination of items, which is not in the frequent (kxe2x88x921)-itemset, is excluded from the candidate k-itemset.
When the candidate k-itemset becomes empty, the processing is ended.
An appearance number of the candidate k-itemset in the purchase history data is counted, and a set of which appearance number reaches a certain value is extracted as a frequent k-itemset.
For counting the appearance number of the combination of the items, only same time purchase should be counted. Since a number of items purchased at a same time is variable, it is not appropriate to represent all the items purchased at the same time by a record in the purchase history data. Therefore, as shown in (1) of FIG. 29, it is general that the record is configured for each of the items, and a combination of items purchased at a same time is represented by the records with a same transaction ID (Identification Data).
Therefore, for finding a combination of k items, it is necessary to produce a record including k items with a same transaction ID by joining the purchase history data with itself kxe2x88x921 times (self-join).
However, in an existing database system, a load of the self-join operation is heavy, and the performance is not enough. Therefore, in most of association rule mining systems such as a data mining processing method described in Japanese Unexamined Published Patent Application Hei 11-3342 (published on Jan. 6, 1999) xe2x80x9cA Group-By Processing Method,xe2x80x9d the purchase history data are extracted from the database system in advance, converted to a file in a unique form including items in a variable number as shown in (2) of FIG. 29, and processing is performed by using a special software.
In a relational database system according to the related art, since a join operation within a same table is not considered, only a processing method for different tables can be used. Therefore, it is impossible to offer a practically sufficient performance for a processing such as the association rule mining which includes many self-join operations.
However, the mining system according to the related art such as Japanese Unexamined Published Patent Application Hei 11-3342 for extracting the frequent itemset outside the database system has problems as follows.
An overhead exists in extracting the mass data.
Since data are copied, extra disk area and managing operations are necessary.
It is one of objects of this invention to solve the above-stated problems in the related art. Particularly, some of the aims of this invention are as follows.
No overhead exists in extracting the mass data.
Since data are not copied, extra disk area and managing operations are not necessary.
By improving the performance of the database system by parallelization, etc., the performance can be improved without changing the mining system.
For achieving the above aims, a content of the query is analyzed, and it is also judged if the database operation processor uses a self-join operation. If the self-join operation is used, a processing method, which is effective and efficient only for the self-join operation, is used instead of an ordinary processing method of the join operation. Accordingly, the query including the self-join operation can be processed efficiently.
According to an aspect of this invention, a database operation processor includes a database for storing data, a query analyzer for inputting and analyzing a query statement for the database, an operation expression generator for generating an operation expression for executing the query statement by inputting an analysis result of the query analyzer, and an operation expression executor for executing the operation expression generated by the operation expression generator and obtaining data satisfying the query statement from the database. The database stores a plurality of tables storing records including at least a column, and the operation expression generator includes a self-join detector for detecting join processing for joining records stored in two tables according to a determined join condition by inputting an analysis result of the query analyzer, detecting if the detected join processing is a self-join operation processing where said two tables are same and the determined join condition is a join condition for a same column, and generating a self-join operation expression for executing the detected self-join operation processing. The operation expression executor executes the self-join operation expression generated by the self-join detector and producing a result of the self-join operation processing.
According to another aspect of this invention, the database operation processor includes a database for storing data, a query analyzer for inputting and analyzing a query statement for the database, an operation expression generator for generating an operation expression for executing the query statement by inputting an analysis result by the query analyzer, and an operation expression executor for executing the operation expression generated by the operation expression generator and obtaining data satisfying the query statement from the database. The database stores a plurality of tables storing records including at least a column, and the operation expression executor includes a self-join detector for detecting a join operation expression for joining records stored in two tables according to a determined join condition by inputting the operation expression generated by the operation expression generator, and detecting if the detected join operation expression is a self-join operation processing where said two tables are same and the determined join condition is a join condition for a same column, and a self-join operation executor for producing a result of the self-join operation processing when the self-join operation processing is detected by the self-join detector.
According to another aspect of this invention, the database operation processor includes a database for storing data, a query analyzer for inputting and analyzing a query statement for the database, an operation expression generator for generating an operation expression for executing the query statement by inputting an analysis result of the query analyzer, and an operation expression executor for executing the operation expression generated by the operation expression generator and obtaining data satisfying the query statement from the database. The database stores a plurality of tables storing records including at least a column, and the query analyzer includes a self-join detector for inputting the query statement and analyzing a join processing for joining records stored in two tables according to a determined join condition and further detecting if the analyzed join processing is a self-join operation processing where said two tables are same and the determined join condition is a join condition for a same column, and outputting a self-join operation analysis result for instructing the operation expression generator to generate a self-join operation expression for executing the detected self-join operation processing. The operation expression generator generates the self-join operation expression according to the self-join operation analysis result output by the self-join detector, and the operation expression executor includes a self-join operation executor for executing the generated self-join operation expression by the operation expression generator according to the self-join operation analysis result output by the self-join detector and producing a result of the self-join operation processing.
Further features and applications of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
Other objects features, and advantages of the invention will be apparent from the following description when taken in conjunction with the accompany drawings.