The present invention relates to a database method, system and computer program for making reference to a table. More particularly, the present invention relates to a database retrieval processing method, system and computer program for simultaneously executing a plurality of retrieval processings.
A data warehouse is a practical way to make use of the expansion of the parallel server market for commercial applications. Typically, it is an enterprise data system of three hierarchical structures made up of an enterprise data warehouse for storing tremendous amounts of data, a section data warehouse for storing data picked up for each of the sections, and a number of client terminals.
In this data warehouse, a multidimensional analysis is frequently carried out in order to analyze a table of raw data from various angles. For this purpose, the retrieval is repetitively executed for a table under various conditions. In the database system, these retrievals have been executed separately. The data warehouse holds raw data in tremendously large amounts on a tera Byte (TB) scale, and extended periods of time are required for scanning the whole data. When the retrieval is repetitively executed under various conditions, therefore, the whole raw data must be scanned many times repetitively, requiring very long processing times.
FIG. 3 illustrates an SQL sentence of a query including a plurality of retrieval processings. FIG. 3 includes two retrieval processings Q1 and Q2 designated by a WITH phrase. In the retrieval processing Q1, a table T1 and a table T2 in which the column D3 is smaller than 100 are joined together with a column C1 as a key, the columns C1 and C2 form a group, and a total of the columns D1 and D2 is found for each of the groups. In the retrieval processing Q2, a table T2 in which the column D4 is smaller than 100 and a table T3 are joined together with the column C3 as a key, the columns C2 and C3 form a group, and a total of the columns D1 and D2 is found for each of the groups.
In the SQL of FIG. 3, the processings of Q1 and Q2 are rounded as a result of retrieval through UNION ALL. In this case, an identifier "Q1" and an identifier "Q2" are attached to the head to indicate whether the result is form the retrieval processing Q1 or the retrieval processing Q2. The SQL includes a processing for joining the tables T1 and T2 together to form a group, and a processing for joining the tables T2 and T3 together to form a group. When the query is executed according to a conventional system, the table T2 is scanned twice. FIG. 2 illustrates the flow of processing in this case.
FIG. 2 illustrates a case where tables T1, T2 and T3 stored in a plurality of secondary storage units 28 to 30 are read by a scan back-end server (Scan BES) 6, are transferred to join back-end servers (Join BES) 4 and 5 and are treated. The scan back-end server 6 has a database buffer 17 for caching the data read out from the secondary storage onto the main memory, and independently executes a total of four table scanning processings, i.e., table scanning processings 14 and 31 for joining the tables T1 and T2 together, and table scanning processings 32 and 16 for joining the tables T2 and T3 together.
In the processing of FIG. 2, the scanning processing of table T2 is executed twice in an overlapped manner. Therefore, the table scanning processing for taking the data from the database buffer must be executed twice in an overlapped manner. Moreover, when there is a difference in the processing speed between the T2 scanning processings 31 and 32, the database buffer 17 loses the cache effect of table T2 that is to be referred to by the scanning processings. In the worst case, the I/O processing for the secondary storage is issued twice.
In order to solve these problems according to the prior art, the simultaneous execution is carried out when the same data are to be accessed among a plurality of retrieval processings as disclosed in Japanese Patent Application No. 8-6829, so that the data taken in by the database buffer through one time of I/O can be utilized among the plurality of retrieval processings.
In the conventional simultaneous execution control system for a plurality of retrieval processings, it is necessary to execute the simultaneous control operation among the retrieval processings in order to heighten the hit rate of the database buffer. However, the queries are thrown into the database system out of synchronism, and a large overhead is required for controlling them in synchronism. Besides, when there is a difference in the rate for scanning the data among the retrieval processings, the data held in the database buffer cannot be used in common unless the synchronizing control operation is frequently executed.
In the conventional system, furthermore, the processing is commonly executed for reading the data from the secondary storage into the database buffer, but the processing for reading the data from the database buffer to a local buffer in each retrieval processing is executed for each of the retrieval processings. Therefore, the processing is not commonly carried out to a sufficient degree.
As means for commonly issuing a plurality of retrieval processings as a series of queries, there have been proposed a stored procedure of SQL and a WITH phrase standardized with SQL3. In the multidimensional analysis of data warehouse, a plurality of retrieval processings are issued simultaneously and can be rounded into a one time of query by a means which uses the stored procedure or the WITH phrase.
As apparatus for commonly issuing a plurality of retrieval processings as a series of queries, there can be further developed apparatus which buffers and commonly issues a plurality of SQL sentences in response to a query reception server of the database system in addition to the abovementioned apparatus.
A conventional system disclosed in "The Design and Implementation of a Sequence Database System" by P. Seshadri, et al., Proceedings of the 22nd VLDB Conference, September 1996, pages 99-110 arranges common retrieval processings into a single retrieval processing and executes a search with respect to the single retrieval. However, the disadvantage of this conventional system is that the search result is stored as a temporal list and the temporal list is later read repeatedly by the original retrieval processings.