1. Field of the Invention
This invention relates to a data sort process, and more particularly to a sort processing method of and apparatus for sorting all data by forming a plurality of sorted data blocks based on input data, temporarily fetching the blocks into a plurality of work buffers, merging data records while sequentially reading out the data records from the plurality of work buffers and outputting the same.
2. Description of the Related Art
In the sort process, if the amount of all data to be sorted is small and all of the data can be developed in a main memory, all of the to-be-sorted data may be developed in the main memory and the data may be sorted in a sort order in the main memory and output.
However, in many cases, the amount of all of the to-be-sorted data is larger than the capacity of the main memory and it is impossible to develop all of the to-be-sorted data in the main memory. If all of the to-be-sorted data cannot be developed in the main memory, sort of all of the data is generally effected as follows.
As shown in FIG. 1, in the first step S1, data is sequentially read out by an amount of data which can be developed in the main memory from an input file F1 in which to-be-sorted data is stored, data in each block is sorted to create a plurality of data blocks D1 constructed by sorted data groups which are discretely sorted and the data blocks are stored into an external file F2.
In the second step S2, data items of the plurality of data blocks D1 which are created in the first step S1 and discretely sorted are read out from the external file F2 and merged, all of the data is sorted and the result of sort is stored into an output file F3. In a normal case, each of the input file F1 and output file F3 is part of the external file F2.
As a merge method in the second step S2, for example, a method disclosed in Japanese Patent Disclosure (KOKAI) Publication No. S.57-90757 (the merge method is hereinafter referred to as a "block merge") and, for example, a method disclosed in Japanese Patent Disclosure (KOKAI) Publication No. H.2-75018 (the merge method is hereinafter referred to as a "buffer merge") are used.
In the block merge and buffer merge, for example, when a plurality of data blocks D1 constructed by the sorted data groups are created, index information of indices or data records having the highest sort order in the respective data blocks, that is, representative records corresponding to the head data records of the respective data blocks (since each data block is discretely sorted) is created. At the same time, a plurality of sorted data strings are created at the time of creation of the data blocks D1. In the sorted data strings, data record strings of the respective blocks are formed such that all of the records constructing the data blocks set in the index order in the data strings are arranged in the sort order throughout all of the data blocks. In this case, even if data blocks of different strings are arranged in the index order, the records in the data blocks are not always arranged in the sort order.
In the block merge and buffer merge described above, data records in the respective data blocks of the sorted strings are merged.
As indicated by an example shown in FIG. 2, in the block merge, before data is read out or fetched from a temporary file F4 which is an external file into a work buffer B1 in the main memory, index information constructed by representative records corresponding to the head data records of the respective data blocks is stored into an index area I1 set in another location of the main memory.
Since the index information is created at the same time that the sorted data blocks and strings are created, the created index information is kept in the index area of the main memory if the merge process is effected immediately after creation of the sorted data blocks and strings. If the merge process is effected not immediately after creation of the sorted data blocks and strings, the created index information is kept stored in the external file together with the sorted strings and is fetched into the index area of the main memory when the merge process is effected.
Next, the merge process in the block merge is explained with reference to FIG. 3. All of the indices in the index area I1 are sorted and data records of one block are fetched from the temporary file F4 into the work buffer B1 in the main memory in an order indicated by the indices (step S11). Then, in the work area, records in the record save area (RSA) R1 in the main memory and records in the work buffer B1 are merged and arranged (step S12). Records which have a sort order higher than the index of a block to be next fetched and which can be output among the records in the work buffer B1 are sequentially output to an output file F5 (external file) via an output buffer B2 (another area in the main memory) in the sort order by referring to the index of the index area 11 (step S13).
When it is detected that data records to be output are no more present in the work buffer B1, whether or not a space area is present in the record save area R1 is checked (step S14) and if there is a space area, the data records in the work buffer B1 are transferred into the record save area (RSA) R1 in the main memory (step S16). When data records are already stored in the record save area R1, the data records from the work buffer B1 are transferred in a merged configuration with the data records in the record save area R1. Then, whether or not a data block of records to be merged and output is present in the temporary file F4 is checked (step S17), and if the data block is present, the step S11 is effected again, a new record is fetched from the temporary file F4 into the work buffer B1 and the same processes as described above are effected for data in the work buffer B1 and record save area R1.
If it is detected in the step S14 that the record save area R1 is full, data records in the record save area R1 are written back into the temporary file F4 as a sorted data block or string (step S15) and then the step S16 is effected. When it is determined in the step S17 that no data block of records to be merged and output is present in the temporary file F4, the process is ended.
Thus, all the data is merged and output to the output file F5 via the output buffer B2.
However, the block merge has a problem that the number of record transferring operations between the work buffer B1 into which the data block is stored and the record save area R1 in the sort work area is large.
On the other hand, as indicated by an example shown in FIG. 4, in the buffer merge, work buffers B3 of a number equal to or larger than a number corresponding to the number of sorted data strings stored in a temporary file F6 are allotted in the main memory and index information constructed by representative records corresponding to the head data records of the respective data blocks is stored into an index area I2 which is the same as that described above.
All of the indices in the index area I2 are sorted and data blocks are fetched from the temporary file F6 into the work buffers B3 in the main memory in an order indicated by the indices. Then, data having a sort order higher than the index of a data block to be next fetched is read out from all of the work buffers B3 in a sort order and sequentially output to an output buffer B4 (another area in the main memory). In this case, since work buffers B3 of a number equal to or larger than a number corresponding to the number of the strings are provided, a space area into which a next data block is fetched can be obtained in one of the work buffers B3 when all the data having a sort order higher than the index of a data block to be next fetched is output.
The conventional block merge and buffer merge have the following problems. The block merge has a problem that the number of record transferring operations is large since the record is transferred between the work buffer B1 into which the data block is fetched and the record save area R1 in the sort work area each time the record is fetched. Further, in the buffer merge, the process can be effected with a small number of record transferring operations, but it cannot be applied if the number of sorted data strings exceeds the number of buffers which can be prepared, and therefore, a large number of buffers must be used.