1. Field of the Invention
The present invention relates to a sort processing device for sorting a large volume of data at high speed, a sorting method using the sort processing device, and a data processing apparatus for effecting data retrieval and the like.
2. Description of the Related Art
FIG. 18 shows a conventional data processing apparatus shown in "Information Processing" Vol. 33, No. 12, pp. 1416-1423. Reference numeral 1 denotes a data processing apparatus, and 2 denotes a controller for controlling a database processing device 3 by interpreting a command sent thereto from a CPU 7. Numeral 3 denotes the database processing device for effecting database processing with respect to data stored in a disk device 8, a main storage device 6, and the like, and numeral 4 denotes a sort processing device for effecting sort processing in response to an instruction from the database processing device. The controller 2, the database processing device 3, and the sort processing device 4 are located in the data processing apparatus 1. Numeral 5 denotes a bus of a host computer 9 for connecting the data processing apparatus 1, the main storage device 6, the CPU 7, the disk device 8, and the like. Numeral 6 denotes the main storage device of the host computer, 7 denotes the CPU of the host computer, 8 denotes the disk device for storing data in the host computer, and 9 denotes the overall host computer.
Next, a description will be given of the operation. If a request for data processing is generated in the host computer 9, the CPU 7 of the host computer 9 consecutively fetches data from the disk device 8 in which object data are stored, and continuously transmits the same to the data processing apparatus 1 via the bus 5. At this time, the main storage device 6 of the host computer 9 is used as an input/output buffer area, as necessary. When data are inputted to the data processing apparatus 1, the data processing apparatus 1 effects processing by the database processing device 3 and sort processing by the sort processing device 4, and returns the results to the CPU 7 again via the bus 5. The CPU 7 stores the returned result data in the disk device 8 in the same way as during inputting. The inputting of data to the data processing apparatus 1 and the outputting of the result data from the data processing apparatus 1 are executed in parallel by the controller 2.
Next, a detailed description will be given of the operation of the database processing device 3. With respect to the data inputted to the database processing device 3 from the controller 2, the database processing device 3 executes database processing other than sort processing, such as the selection of data, format conversion, and merge. There are cases where the database processing device 3 is realized by special-purpose hardware, or it is realized by the use of one or a plurality of general-purpose microprocessors. Depending on the contents of the instruction from the CPU 7, the database processing device 3 effects sort processing by controlling the sort processing device 4 when sort processing is necessary. Generally, prior to sort processing the database processing device 3 effects the selection of data, format conversion, and the like, or effects totalization processing and the like after sort processing. Meanwhile, if the sort processing is not necessary, the database processing device 3 alone effects the selection processing of data and the like, and returns the results to the CPU 7 via the controller 2. In addition, at this juncture, as shown in, for example, Japanese Patent Application Laid-Open No. 63-86043, the storage device of the sort processing device 4 is shared, the sort processing device is stopped, and its storage device is used as the storage device of the database processing device 3 so as to be used as a large-capacity buffer storage device in processing such as the merging, combination, and the like of data.
An example of the configuration of the above-described database processing device is shown in FIG. 19. In FIG. 19, the same reference numerals as those shown FIG. 18 denote identical or corresponding parts. Numerals 34 and 35 denote general-purpose microprocessors, and numerals 36 and 37 denote main storage memories which are respectively connected to the microprocessors 34 and 35. Numeral 38 denotes a bus for connecting the two general-purpose microprocessors 34 and 35, the controller 2, and the sort processing device 4; numeral 31 denotes a bus for inputting data to the sort processing device 4; numeral 32 denotes a bus for outputting the data from the sort processing device 4; and numeral 33 denotes a bus for accessing the shared storage device in the sort processing device 4. Hereafter, a description will be given of the operation of the database processing device 3 separately with respect to a case where the designated data processing uses the sort processing device 4, and a case where it does not.
In the case where the designated processing uses the sort processing device 4, as for the microprocessors 34 and 35, the microprocessor 34, for instance, is assigned to data selection processing with respect to input data. The microprocessor 34 continuously receives the data which are sent thereto from the controller 2 via the bus 38, fetches only necessary data by using its main storage memory 36, and sends the necessary data consecutively to the sort processing device 4 via the bus 31. The sort processing device 4 continuously receives these data, rearranges them and consecutively sends the results back to the microprocessor 35 via the bus 32. Upon receiving the results, the microprocessor 35 effects, for example, the format conversion, aggregate function, and the like of data by using the main storage memory 37, and sends the results back to the controller 2 via the bus 38.
In the case where the designated processing does not use the sort processing device 4, the input data processing and the output data processing are respectively allocated to the microprocessors 34 and 35. In this case, since the sort processing device 4 is not used, its operation is stopped, and the storage device of the sort processing device 4 is alternatively used as the main storage of the microprocessors 34 and 35 via the bus 33. Namely, the microprocessors 34 and 35 utilize as their shared storage device a part of the storage device which the sort processing device 4 has, in addition to the main storage memories 36 and 37 which the microprocessors 34 and 35 respectively have. The data which are sent to this storage device are partially stored, the number of inputs to and outputs from the controller 2 can be reduced, thereby making it possible to improve the processing speed. For example, processing for merging groups of data stored in a plurality of files can be executed such that the microprocessor 34 temporarily stores in the area of the shared storage device in the sort processing device 4 the data of the group of files received from the controller 2 while the data are being consecutively classified in correspondence with the files, and, at the same time, the microprocessor 35 merges in parallel the data of the respective files located in that area.
Next, a description will be given in detail of the operation of the sort processing device 4. Data strings sent from the CPU 7 via the database processing device 3 are continuously inputted to the sort processing device 4, and are rearranged in a designated order, and the results are returned to the database processing device 3 again. This process is shown in FIG. 20 described in the aforementioned "Information Processing." FIG. 20 is a diagram illustrating the internal configuration of the sort processing device 4. In FIG. 20, the same reference numerals as those shown in FIG. 19 denote identical or corresponding portions. Reference numeral 41 denotes a first-stage sort processor for effecting sort processing first with respect to the data inputted through the bus 31; numeral 42 denotes a second-stage sort processor for effecting sort processing with respect to the output data sorted by the first-stage sort processor; and numerals 43 and 44 denote a third-stage sort processor and a fourth-stage sort processor each adapted to effect sort processing with respect to the output data from the respective preceding-stage sort processor. The output data from the sort processor in the fourth stage, i.e., a final stage, are outputted to the microprocessor 34 or 35 via the bus 32. Although the four sort processors 41 to 44 are illustrated here to simplify the description, the number of the sort processors may be increased or decreased, as required. Reference numerals 45 to 48 denote shared storage devices which are respectively connected to the sort processors 41 to 44. The storage capacities of the shared storage devices 45 to 48 vary in correspondence with the sort processors 41 to 44 connected thereto. For example, the storage capacity of the shared storage device connected to an ith-stage sort processor has a capacity calculated by 2 to the (i-1g)th power.
Next, a description will be given of the details of sort processing by the sort processing device 4. FIG. 21 is a diagram illustrating the contents of data inputted to the respective sort processors as well as their input timings. Numeral 49a denotes a data string inputted to the first-stage sort processor; 49b, a data string inputted to the second-stage sort processor; 49c, a data string inputted to the third-stage sort processor; and 49d, a data string inputted to the fourth-stage sort processor.
Now, a case is considered in which the data EQU 8, 2, 1, 3, 5, 7, 6, 4,
are consecutively inputted to the sort processing device 4, and sorting is carried out in descending order. First, the leading sort processor 41 in the first stage fetches the inputted data in two's, rearranges them, and sends them to the sort processor 42 in a subsequent stage. The data inputted in two's to the subsequent-stage sort processor are EQU 82, 31, 75, 74,
Here, the order of the data "1" and 3" sent from the preceding-stage sort processor 41 is reversed, and the data are outputted as a combination of two pieces of data sorted in the reverse order as "31." The data which are thus sorted in two's are inputted to the second-stage sort processor 42, which fetches them in two sets and merges them, and sends data strings sorted in four's to the subsequent stage. The result is EQU 8321, 7654,
Here, if, for example, "82 and "31" are merged, the result is "8321." The data thus sorted in four's are inputted to the third-stage sort processor 43, which fetches them in two sets and merges them, and sends data strings sorted in eight's to the subsequent stage. The result is EQU 87654321,
The fourth-stage sort processor 44 and subsequent sort processors also effect similar processing.
Here, as shown in FIG. 21, each of the sort processors 41 to 44 is capable of starting processing before the preceding-stage sort processor completes all the processing. Consequently, it can be seen that if the data are inputted continuously, the sorted results are outputted in parallel with the data input with a slight time lag.
For example, a description will be given of the start of processing by the second-stage sort processor 42. The first-stage sort processor 41 receives "8" in Step S1 and "2" in Step S2. Then, the first-stage sort processor 41 compares "8" and "2" in Step S3, and outputs "8" which is a greater numerical value, and receives a new numerical value "1." Then, in Step S4, the first-stage sort processor 41 compares "2" and "1" which are presently stored, and outputs "2," and receives a new numerical value "3."
Meanwhile, the second-stage sort processor 42 starts operation in Step S3, and receives the data "8" outputted from the first-stage sort processor 41. Then, in the same way as in Step S3, the second-stage sort processor 42 receives "2" in Step S4 and "3" in Step S5. Then, in Step S6, the second-stage sort processor 42 compares "8" and "3" and outputs "8" which is a greater numerical value, and designates "2" as the data to be compared next. Meanwhile, the second-stage sort processor 42 receives a new numerical value "1" from the first-stage sort processor 41, and this value "1" is stored by being stacked after "3." In Step S7, the second-stage sort processor 42 compares "2" and "3," 1 and outputs "3" which is a greater numerical value.
As described above, the output of the sorted result is started before the sort processor receives the overall data string to be sorted (in this case, before Step S6).
In this way, the rearrangement, i.e., sorting, of 2.sup.n pieces of data is carried out by n sort processors.
Here, since the storage capacities of the shared storage devices 45 to 48 connected to the respective sort processors 41 to 44 are determined by the capacities of memory chips, the configuration, and the like, these storage capacities in reality are not necessarily two-fold that of the preceding stage. For instance, memory chips each having a capacity of 512 KB are mounted in all of the first 10 stages, a memory chip having a capacity of 1 MB is mounted in the 11th stage, a memory chip having a capacity of 2 MB is mounted in the 12th stage, and likewise in the subsequent stages.
Since the conventional data processing apparatus 1 is configured as described above, the following problems are encountered.
Different processing cannot be executed while certain processing is being executed. For instance, if the data processing apparatus 1 starts the execution of processing which requires a long time in execution, other processing cannot be executed.
In particular, in cases where a small number of processing operations whose loads are heavy and a large number of processing operations whose loads are light are present, if the execution of processing whose load is heavy is started, the wait time for the processing whose load is light becomes very long, resulting in a decline in the throughput of the system.
If an attempt is made to execute the plurality of processing operations simultaneously to overcome the above-described problem, there arises a need to connect a plurality of data processing apparatuses to the host computer, which results in increased cost.
In general, in corporate database operations, a multiplicity of processing operations whose loads are relatively light, such as marketing support, are executed in the daytime, while a small number of processing operations whose loads are heavy, such as daily batch processing, are executed in the nighttime. For this reason, a large number of data processing apparatuses are desirable for operations in the daytime, while a small number of high-speed data processing apparatuses are desirable for operations in the nighttime. However, insofar as the conventional data processing apparatuses are used, it is necessary to install a large number of data processing apparatuses in order to meet the demands for the daytime, while most of these data processing apparatuses are not used in the nighttime. Hence, the rate of utilization of resources declines.