It takes much time to retrieve, aggregate or sort a large amount of data such as a database, or to join or update these data. In order to solve this, the present inventor has proposed a method for retrieving, aggregating and sorting tabular format data at very high speed and a method for joining or updating tabular format data or performing transaction processing (for example, see International Publication No. WO00/10103, JP-A-2000-339390, and JP-A-2001-043290).
A series of methods disclosed in these documents are innovative in that no index is used, a unified processing can be performed, and the processing can be performed even for a subset without lowering the efficiency.
Tabular format data originated with the present inventor has a data structure as described below. FIG. 1(a) is a view showing an example of the tabular format data. In this example, each of record numbers in an array 100 storing the record numbers is made to correspond to values (item values) of items of “member name”, “area” and “fan” (see reference numeral 101). The data structure originated with the present inventor holds the tabular format data as shown in FIG. 1(a) in a format as shown in FIGS. 2(a) to 2(c).
For example, as shown in FIG. 2(a), with respect to the item “member name”, there are provided a value list (hereinafter referred to as “VL” in some cases) 211 in which the item values are sorted in the order of the kana syllabary (alphabetical order in English letters) and a pointer array (hereinafter referred to as “PV” in some cases) 212 for the value list, which stores numbers corresponding to the respective record numbers and indicating element positions in the value list 211 indicated by the record numbers. As stated above, with respect to a certain item, an array group including a pointer array and a value list and including, in addition to these, one or more accompanying arrays in some cases is referred to as an information block (see reference numeral 201). As is understood from FIG. 2(a), the pointer array contains elements the number of which is equal to the number of records of the tabular format data. Also with respect to the item “area”, an information block 202 includes a PV and a VL, and similarly, also with respect to the item “fan”, an information block 203 includes a PV and a VL.
When attention is paid to the record number “0”, since the value of the PV in the information block 201 is “5”, the item value “Green” in the VL, the storage position of which is “5”, is specified. Similarly, since the values of the PVs of the information blocks 202 and 203 are “5” and “2”, respectively, the item values “Tokyo” and “G-Team” at the corresponding storage position numbers are specified. It would be understood that these correspond to the item values contained in the record number “0” in FIG. 1(a).
Retrieval, aggregation, sort and join are performed using the data structure as stated above, so that a remarkably high processing is realized.
However, in the processing of retrieval or the like using the above data structure, it has been recognized that following problems can arise.
Even in the case where a small number of subsets are made a processing object among all records, and a desired record is retrieved from the subsets, according to the method originated with the present applicant, an array having the same size as the VL is provided, and as an element of the array, a flag indicating whether or not it is contained in the range of values to be retrieved is arranged. Similarly, an array having the same size as the value list is required for pass/fail judgment of the retrieval.
Also at the time of sorting or aggregating, one or more arrays having the same size as the value list are required. Further, also at the time of join processing, it is necessary to provide various arrays having the same size as the value list.
Accordingly, even in the case where the processing of retrieval, aggregation, sort and join is performed while a very small number of subsets are made a processing object from a remarkably large number of records, when a large value list is provided in an information block, a large memory area for an array required for the processing is needed. Besides, a processing of arranging values in the array, a processing of scanning the values, and the like are required. Thus, there has been a problem that as compared with the case where all the records are made a processing object, although a processing object is small, the processing time can not be correspondingly shortened.
The invention has an object to provide a method of realizing the shortening of a processing time corresponding to the size of a subset by causing the array to coincide with the size of the subset or the size of a subset of a value list limited in accordance with this.