1. Field of the Invention
The present invention relates to a merge and sort process method for merging a plurality of sorted data sequences so as to generate one sorted data sequence. In particular, the present invention relates to a parallel merge and sort process method using a plurality of processors and a system thereof.
2. Description of the Related Art
The parallel merge and sort process needed for application in databases. The basic operations for the databases are for example sorting and retrieving records. If a parallel merge and sort process is executed by a parallel processing computer, the records can be sorted and merged at high speed.
The sorting is a process for reordering a set of records into a new sequence corresponding to a relation of particular keys of the records. These keys are referred to as sort keys. There are two ways for reordering the records during sorting: ascending order and descending order of the values of the sort keys.
In addition, the sorting can be categorized as internal sort and external sort. The internal sort is performed when data to be sorted is stored in a main memory, whereas the external sort is performed when data to be sorted is in an external memory. Access time of the computer to the main memory is shorter than that to the external memory. Thus, the speed of the internal sort is higher than the speed of the external sort. However, since the storage capacity of the main memory is limited, when a large number of records are sorted, they should be externally sorted in such a manner that they are stored in the external memory and sorted therein.
To properly sort such a large amount of data of records, a merge and sort process method is known. In this method, a data sequence to be sorted (this data sequence is referred to as a sort objective list) is divided into a plurality of segments. Records in each segment are sorted corresponding to sort keys. Sorted segments (sorted lists) are merged so as to generate one sorted list. In this list, records are sorted in ascending order or descending order corresponding to the values of the sort keys.
To increase the speed of the merge and sort process method, a parallel merge and sort process method using a plurality of processors has been proposed. To further increase the speed of the merge and sort process method, a technology for properly allocating a plurality of sorted lists to the processors at high speed is required.
To perform the merge and sort process at high speed with a plurality of processors, a plurality of sorted lists are divided into segments and allocated to processors. Thereafter, the processors perform the merge and sort process for their allocated segments.
For example, a method for sampling a proper record of a plurality of sorted lists and dividing the lists corresponding the sampled result has been proposed as Japanese Patent Laid-Open Publication No. 2-227725, titled "Segmenting Sorted Lists (translated title)".
FIG. 1 is a schematic diagram showing an example of a parallel merge and sort process using a conventional multi-processor system. In this example, three process units 10-1 to 10-3 perform the merge process for sorted merge objective lists 11-1 to 11-3.
In this method, sort keys that are selected at random from the merge objective lists 11-1 to 11-3 are used as dividing points 12 and 13 so as to allocate the merge objective lists 11-1 to 11-3 to the process units 10-1 to 10-3. It is determined whether or not each segment divided by the dividing points 12 and 13 has nearly the same number of elements. When the determined result is "NO", other sort keys are used as the dividing points according to a predetermined procedure. Thereafter, the segments to be divided are determined.
By adjusting the dividing points 12 and 13, segments Seg1 to Seg3 are selected so that the number of elements thereof becomes the nearly same. Seg1 is allocated to the process unit 10-1. Seg2 is allocated to the process unit 10-2. Seg3 is allocated to the process unit 10-3. Thus, the Seg1 to Seg3 are processed in parallel by the process units 10-1 to 10-3, respectively.
Other merge and sort process methods that are not parallel merge and sort process methods are disclosed as Japanese Patent Laid-Open Publication No. 57-90757 "Sort and Merge Process Method" and No. 2-75018 "Merge Process Method". The former is a merge and sort process method for effectively performing a merge and sort process with a single processor. The latter is a modification of the former method. In these methods, sort keys of records representing blocks of the sort objective list including records to be sorted are obtained. With the sort keys obtained, the records are sorted and merged.
In the merge process method using a single processor disclosed as Japanese Patent Laid-Open Publication No. 2-75018, with the system construction shown in FIG. 4, the records are merged and sorted corresponding to a flow chart shown in FIG. 2. Arrows in FIG. 2 represent flows of information. Next, the merge and sort process shown in FIG. 2 will be described step by step.
1. A record group with a predetermined number of records is read from a sort objective list 31 in a memory unit 30 to an internal sort buffer 21 (at step S1 of FIG. 3). An internal sort process unit 22 sorts the record group corresponding to sort keys and stores the sorted record group in the buffer 21 (at step S2 of FIG. 3).
2. The sorted record group that is one sorted and merged objective list stored in the internal sort buffer 21 is written to an intermediate file 32 of the memory unit 30. At this point, the first record of a plurality of records of each block is selected as a representative record. The value of a sort key of the representative record and an identifier of the block are paired as one record. This record is added to an auxiliary information list 23 (at step S3 of FIG. 3). In the initial state, the auxiliary information list 23 is empty.
3. The steps 1 and 2 are continued until the entire sort objective list 31 is completely processed (at step S4 of FIG. 3). Thus, L sorted merge objective lists 11-1 to 11-L are generated in the intermediate file 32.
4. An internal sort process unit 22 sorts record groups in the auxiliary information list 23 corresponding to sort keys of the records (at step S5 of FIG. 3).
5. Records in the sorted auxiliary information list 23 are read in succession from the beginning and blocks corresponding to block identifiers are read from the intermediate file 32 to merge input files 24-1 to 24-L, each of which has the storage capacity for one block (at steps S6 to S8 of FIG. 3).
6. A merge process portion 25 performs an L-way merge process for L blocks stored in the merge input buffers 24-1 to 24-L and inserts records into a merge output buffer 26 in the descending order of sort key values of the records (at step S9 of FIG. 3). When one of the merge input buffers 24-1 to 24-L is used, the next record is retrieved from the auxiliary information list 23 and a block corresponding to the block identifier of the record is written from the intermediate file 32 to the used merge input buffer 24-i (where 1.ltoreq.i.ltoreq.L) (at steps S46 to S48 of FIG. 13). When the merge output buffer 26 becomes full, the content therein is added to a sorted list 33 and the merge output buffer 26 is emptied (at steps S10 and S11 of FIG. 3). In the initial state, the sorted list 33 is empty.
For the L-way merge process, refer to "The Art of Computer Programming, Vol. 3. Sorting and Searching", by D. E. Knuth, Addison-Wesley Publishing Company Inc., 1973, pp. 252-253.
7. The step 6 is continued until all records in the auxiliary information list 23 are completely processed.
8. The merge process portion 25 performs the merge process until all the merge input buffers 24-1 to 24-L are used (at step S12 of FIG. 3) and inserts the results in the merge output buffer 26. At this point, when the merge output buffers 26 become full, the contents thereof are added to the sorted list 33, and the merge output buffer 26 is emptied.
9. When there is a record left in the merge output buffer 26, the content thereof is added to the sorted list 33 (at step 13 of FIG. 3) and the process is completed.
In the above-described merge and sort process, when a list group 11-i (where 1.ltoreq.i.ltoreq.L) to be merged is generated, records with block identifiers and values of sort keys of first records for blocks of sorted merge objective list groups 11-1 to 11-L are generated and registered in the auxiliary information list 23.
According to the conventional system, blocks to be merged can be effectively selected. However, the merge process is not performed in parallel by a plurality of processors. Thus, to increase the speed of the merge process in the conventional system, along with the technology shown in FIG. 1, an advanced technology for properly segmenting the sorted merge objective list and for allocating it to a plurality of processors is required.
In the conventional parallel merge and sort process method shown in FIG. 1, to obtain one dividing point, the process for sequentially reading records from each of a plurality of sorted merge objective lists 11-1 to 11-3 should be repeated a plurality of times. However, the sorted lists are stored in a slow memory unit such as a secondary memory unit, if a large number of records are processed, as in a database system, it takes a long time. This drawback prevents the sort speed of records from increasing.