1. Field of the Invention
This invention relates to a sort processor and a sort processing device for rearranging a large quantity of data, i.e. sort processing in data base processing and business data processing. In this specification, a sort processor and a sort processing device are differentiated. A sort processor is defined as a single processor for performing a portion of a sorting process, on the other hand, a sort processing device is defined as a device for performing a complete sorting process using multiple sort processors connected in series.
2. Description of the Prior Art
FIG. 6 shows a computer system equipped with a conventional sort processing device, which is disclosed in "Information Processing", Vol. 33, No. 12, 1992, p.p. 1416-1423. In the figure, the computer system comprises a sort processing device 1, a system bus 2 of a host computer, a main storage device 3 of the host computer, a CPU 4 of the host computer, a disk drive 5 for storing data in the host computer, and a host computer 6.
Operation of the conventional sort processing device is explained by referring to FIG. 6. Firstly, an outline of the operation is explained. When a demand for data processing is generated in the host computer 6, the CPU 4 in the host computer continuously extracts data from the disk drive 5, where the object data are stored, and then continuously transmits the extracted data to the sort processing device 1 via the system bus 2. At this time, the main storage device 3 of the host computer 6 is used as an input-output buffer area as needed. The sort processing device 1 performs sort processing when the data is inputted, and returns the result to the host computer 6 via the system bus 2. The host computer 6 stores the returned result data in the disk drive 5 in the same manner as that of data inputting. The inputting of the sort processing device 1 and the outputting of result data from the sort processing device 1 are performed in parallel.
Secondly, the detailed operation of the sort processing device is explained below. The sort processing device 1 is continuously inputted with a sequence of data transmitted by the host computer 6, rearranges the inputted data in an assigned order, and returns the result to the host computer 6. This operation is explained using FIG. 5 disclosed in the above-mentioned "Information Processing". FIG. 7 shows the construction of the sort processing device 1 of FIG. 6, and the sort processing device 1 is constructed by serially connecting sort processors 11, 12, 13 and 14. The respective processors 11-14 are connected to the data storage devices 15, 16, 17 and 18. A host computer interface 21 is used for exchanging data and instructions with the host computer 6, and a controller 19 is used for controlling the entire sort processing device 1. The sort processors 11-14 are called respectively a sort processor in a first stage, a sort processor in a second stage, and a sort processor in a third stage. A sort processor in an ith stage has a storage capacity corresponding to 2.sup.i-1 data values.
An operation is explained below using an example where data are inputted to the sort processing device 1 in the order of "8, 2, 1, 3, 5, 7, 6, 4".
Firstly, the sort processor 11 of the first stage extracts the inputted data in groups of two rearranges the data in an assigned order, and transmits the rearranged data to the sort processor of the next stage. The sorted groups are inputted to the next stage in the order of (8,2), (3,1), (7, 5), (6,4), . . . .
As these numbers show, in the sort processor 11 of the first stage, the order "1, 3" of the inputted data is rearranged, and the sorted paired data in the order of (3,1) is outputted. The sorted combination of data hereinafter referred to as a data string or a string. The sort processor 12 of the second stage inputted with these sorted data strings, merges successive pairs of the strings, and transmits the merged data strings sorted for four data values to the next sort processor 13. The resulting strings are (8, 3, 2, 1), (7, 6, 5,4), . . . . As explained above, as a result of sort processing by the sort processor 12 of the second stage, the inputted data strings (8, 2) and (3, 1) are merged to output the data string (8, 3, 2, 1). The sort processor 13 of the third stage is inputted with those data strings sorted for four data values, merges successive pairs of strings and transmits the merged data string sorted for eight data values to the sort processor 14 of the next stage. The result of the sorting is (8, 7, 6, 5, 4, 3, 2, 1), . . . . The sort processor 14 and succeeding sort processors perform similar processing.
It is possible for the sort processors of the respective stages to start the sorting process before the sort processor of the preceding stage completes its sorting process. Therefore, when data is continuously inputted to the sort processor of the respective stages, the sort result is outputted in parallel with the data input after some delay.
In this manner, the "n" sort processors carry out the rearrangement, that is, the sorting of 2.sup.n data. The respective sort processors utilize the connected respective data storage devices 15, 16, 17 and 18 as a storage areas in their comparison and merge processes.
Secondly, an operation of the sort processors is explained in reference to FIG. 9 which shows the construction of the conventional sort processor disclosed in "Information Process" Vol. 31, No. 4, 1990, p.p. 457-465. FIG. 9 shows only the inside of the sort processor 12 for the simplicity of the explanation, however, the inside of the other sort processors are the same as that of the sort processor 12. The sort processor of FIG. 9 comprises a comparator 120 for comparing data strings, respective latch registers 121 and 122 for temporarily storing a portion of compared data, data input ports 123 and 124 for inputting the data to the comparator 120, and a controller 125 for controlling operations of the sort processors. The latch registers 121 and 122 have a data size equal to the comparison data size of the comparator 120 and a data width equal to that of the input ports 123 and 124. In the following explanation, the data size is assumed to be 4 bytes. An input bus 127 inputs the data from the preceding sort processor 11, and a data and address bus 128 exchanges data with the data storage device 16 connected to the sort processor 12.
The sort processing by the sort processor 12 is explained in detail below. The explanation is done using an example of a case when the sort processor 12 is inputted with data strings (8, 2), (3, 1), . . . in sequence from the sort processor 11 of the preceding stage, the sort processor 12 merges the data strings to a data string (8, 3, 2, 1), . . . and outputs the merged data string to the sort processor 13 of the next stage.
First of all, the data string (8, 2) which is firstly inputted is stored as it is in the data storage device 16 connected to the sort processor 12. Secondly, the data "3" of the data string (3,1) is inputted into the sort processor 12, and stored in the data storage device 16 in the same manner. Then thirdly, the top data "8" in the first data string (8, 2) and the top data "3" of the data string (3, 1) are loaded from the data storage device 16 to the latch registers 121 and 122 for every 4 bytes, and comparison takes place in the comparator 120. These data are compared for every 4 bytes in sequence in this comparison.
In the sort processing, if the ascending order is given as an order assignment for the sort key, the smaller comparison result is outputted first. If the descending order is given as an order assignment, the larger comparison result is outputted first. The controller 125 controls the ascending/descending order for the sort key which begins from a top byte of a data. When the comparison result is determined, the data is outputted to the next sort processor 13. In parallel with this comparison process, the top data "1" of the second data string (1, 3) is stored in the data storage device 16.
In this case, as a result of the comparison, the data "8" is outputted to the sort processor 13 of the next stage. Therefore, in the following comparison, the data "3" and the data "2", which follows the "8" of the data string (8, 2), are compared. The comparison is carried out by inputting the data "3" and the data "2" to the respective latch registers 121 and 122 from the data storage device 16 for every 4 bytes, and by comparing the data for every 4 bytes.
Because a conventional sort processing device is constructed in the way explained above, there have been problems such as follows:
(1) In order to improve the performance of the sort processing, it has been difficult to increase the number of data values which the sort processors can compare at one time, for example, 4 or 8. For instance, "Electric Communication Society Article Magazine", Vol. 1, J 66 D, No. 3, 1983/3, p.p. 332-339, discloses that, in theory it is possible to merge K input strings into one. However, in the conventional sort processor, the comparison result is controlled by information telling which of the data inputs is larger, and the above reference does not disclose a way to increase the numbers which the sort processors can compare at one time to 4 or 8, which is larger than 2. Therefore it has been impossible to perform the sorting process efficiently by simply changing the number of data values that the sort processor can compare at one time to 4 or 8, rather than 2.
(2) For example, in the construction of a conventional art, if the number of the latch registers 121 and 122 increases to 4 or 8, it has been necessary in the comparison process to read out the K data, which becomes the object of the comparison, for every 4 bytes from the data storage device 16. That is to say, during the comparison of 4 bytes, it has been necessary to access the data storage device 16 4.times.K times. Thus, the more K increases, the more access to the data storage device 16 is needed, and this results in deterioration of the performance.
As illustrated in FIG. 10 (a) showing the conventional art, for the data flowing on the bus 128 to the data storage device 16, data access is carried out every time a comparison takes place. In other words, a data is read out (r1) to the latch register 121, then another data value is read out (r2) to the latch register 122. These data are compared at last, and the comparison result is outputted to the sort processor 13 of the next stage. At the same time, the data inputted from the sort processor 11 of the preceding stage is inputted (w) to the data storage device 16. The other sort processors are synchronously performing a similar operation to the respective data storage devices in parallel. Therefore, in this case, one comparison takes place for three cycles for a physical comparative unit 4 B (Byte).
In contrast, if the number is extended to, for example, K=4, a comparison takes place for four data in the cycles of r1, r2, r3, r4 and w. In this case, the top processor (P.sub.1 +P.sub.2) functions in the same manner as the combination of the sort processors 11 (P.sub.1) and (P.sub.2). That is, the top processor performs the process to merge the four data. Therefore, 5 cycles are necessary for the comparison of the four bytes. As explained above, in general, when the comparison of the K data is performed at one time, it takes a time of (K+1) cycles per 4 B, and this result in deterioration of performance.
(3) In order to avoid this deterioration of performance, it has been considered in store the key values a buffer and avoid accessing to the data storage device. However, in practice, this approach also has some problems. If K buffers are used in a comparison of the K data, it is necessary to read out the succeeding data again for the buffer of the key value corresponding to the outputted data, and then to start the comparison. Therefore, time is wasted for reading out the key value between the comparison processes, and this also deteriorates performance.
(4) Generally, in sort processing, it is also necessary to merge multiple fields inside a record, to independently assign the ascending/descending order to the respective fields, and to rearrange the data according to the assigned order. Therefore, the controller in the sort processor is required to judge the ascending order or descending order for the respective multiple key fields, and to perform an operation according to the order. Therefore, it has been a problem that the hardware logic becomes complicated.
(5) In addition, in a case of realizing a sort processor using LSI, there is a limitation on the number of pins within an LSI package. That is, it is possible to realize a comparator and a control device in a sort processor by a relatively simple hardware logic to form an LSI. However, on the other hand, although LSI technology advances and makes it possible to integrate a great amount of a hardware logic in one LSI, the sort processor of the above-mentioned conventional art cannot be easily integrated in one LSI, because of the limitation by the number of the pins provided in one LSI.
For example, it is currently possible to integrate two or more sort processors according to the conventional art in an LSI. On the other hand, the number of the pins is limited. More concretely, in the above case, the size of the three buses used for connecting with the external devices becomes as follows:
A data input bus 126 from the preceding stage: 32 bits, PA1 A data output bus 127 to the succeeding stage: 32 bits, PA1 An address and a data bus of the data storage device: the address bus: 32 bits, the data bus: 32 bits.
In addition to the numbers above, when presumed from the design of an ordinary LSI, the power supply, the power source, the ground and other control signals require approximately 30 pins, and a total of 160 LSI external pins are necessary. For example, in case that two sort processors are integrated in a single LSI (in the above situation, the sort processors 11 and 12 are integrated in one LSI), the 32 pins in the data bus connecting the sort processors are unnecessary because they are realized internally. But, about 220 pins including other pins are necessary. In general, in order to integrate N sort processors into one LSI, approximately 32+32+32633 2.times.N+30 pins are required. If N is 4, the number of total pins required is approximately 350. That is, if it is possible to integrate multiple sort processors in a single LSI according to increases in the degree of the integration, it is not possible to integrate multiple sort processors in a single LSI, because the number of the pins in an LSI package is limited.
(6) As a result, when realizing a sort processing device, only extremely few, 1 or 2, sort processors are integrated in a single LSI although there is extra integrating capacity. Therefore, the sort processing device is constructed from relatively many LSIs of sort processors, and this results in an increase in scale.
(7) Furthermore, there is another problem that the hardware scale of data storage device connected to the sort processors also becomes large. For instance, in the respective sort processors, if the number of the data compared at one time is two, the capacity of the data storage device connected to the respective sort processors doubles as the stage proceeds. However, if the capacity of the sort processor of the first stage is approximately 64 bytes, the capacity increases such as 128 bytes in the second stage, 256 bytes in the third stage, 32K bytes in the tenth stage, and 32 M bytes in the twentieth stage. In contrast, a current DRAM has the capacity of 16 M bits or 64 M bits and the access width of 8 bits for one chip. In order to realize the above data storage device with small capacity, if the data storage device with the width of 32 bits is used for example, it is necessary to use at least 4 DRAMs (with access width of 8 bits). 4 DRAMs of 16 M bits are required for the respective stages from the first through the eighteenth stages, and a total of 72 DRAMs is required. The total capacity of those DRAM chips is 144 MB, but only 16 MB is used in reality. Because many DRAM chips are used with low efficiency, the hardware scale becomes large.