1. Field of the Invention
The present invention relates to a parallel sorting system having a plurality of processing devices connected in a network for allowing the amount of communication of each processing device to be decreased so as to increase overall throughput.
2. Description of the Related Art
In a parallel sorting system having a plurality of dedicated processing devices in a network, a memory device and a merging device (which receives two sorted vectors and merges them) are connected with an algorithm-specific connection network. By controlling these devices and the connection network, these vectors are repeatedly merged and divided so as to be arranged in smaller-to-larger order or larger-to-smaller order. Thus, the vectors are sorted.
On the other hand, when the parallel sorting process is performed by a parallel computer constructed of a plurality of general-purpose processing devices connected in a network, 2.sup.n (where n is a predetermined integer) processing devices and a network are used. Each processing device has a main storage device and a merging device which merges sorted vectors. The network connects the processing devices. Each processing device sorts vectors stored in the main storage device. Next, the processing device substitutes the sorted vectors with those of another processing device. The processing device merges the original vectors with the received vectors. The processing device divides the merged vectors into a first half unit and a second half unit and stores one of them in the main storage device. This process sequence is repeated while the processing devices are changed one after the other. Thus, the vectors stored in the main storage device of each processing device connected in the network are arranged in smaller-to-larger order or larger-to-smaller order. In this manner, the sorting process is performed.
Next, a conventional sorting system will be described. FIG. 1 is a schematic diagram used to explain a conventional sorting system having four processing devices.
In the figure, the four processing devices 0 to 4 store data (2, 6, 10), (1, 7, 8), (4, 5, 11), and (0, 3, 9), respectively. These data are sorted.
In the figure, arrows which connect processing devices each represent both a pair of the processing devices which substitute and merge data and a data storing method for the merged data. In other words, two processing devices which are connected with an arrow represent a pair of processing devices which substitute and merge data. The merged data is divided into two units. The smaller half unit is referred to as the first half unit, whereas the larger half unit is referred to as the second half unit. And the second half unit of data is stored in the processing device pointed at by the arrow.
Data is substituted, merged, and divided between the processing device 0 and the processing device 1. The first half unit of the merged data is stored in the processing device 0, whereas the second half unit of the merged data is stored in the processing device 1. In other words, after the data (2, 6, 10) and the data (1, 7, 8) have been substituted and merged between the processing devices 0 and 1, respectively, the merged data (1, 2, 6, 7, 8, 10) are generated. The data (1, 2, 6, 7, 8, 10) is divided into the first half unit (1, 2, 6) and the second half unit (7, 8, 10). The first half unit (1, 2, 6) is stored in the processing device 0, whereas the second half unit (7, 8, 10) is stored in the processing device 1.
While the substituting, merging, and dividing processes are being performed by the processing devices 0 and 1, a similar process sequence is performed by the processing devices 2 and 3. In other words, the data (4, 5, 11) stored in the processing device 2 and the data (0, 3, 9) stored in the processing device 3 are substituted and merged between the processing devices 2 and 3. Thus, the data (0, 3, 4, 5, 9, 11) is generated. The first half unit (0, 3, 4) is stored in the processing device 3, whereas the second half unit (5, 9, 11) is stored in the processing device 2.
By this process sequence, the substituting, merging, and dividing process for the first stage (stage 0) is completed.
Next, similar substituting, merging, and dividing processes are performed between another pair of processing devices. In other words, this process sequence is performed by a pair of the processing devices 0 and 2 and another pair of the processing devices 1 and 3. Between the processing devices 0 and 2, by the substituting and merging processes, the data (1, 2, 5, 6, 9, 11) is generated. The first half unit (1, 2, 5) is stored in the processing device 0, whereas the second half unit (6, 9, 11) is stored in the processing device 2. On the other hand, between the processing devices 1 and 3, by the substituting and merging processes, the data (0, 3, 4, 7, 8, 10) is generated. The first half unit (0, 3, 4) is stored in the processing device 1, whereas the second half unit (7, 8, 10) is stored in the processing device 3.
Thereafter, similar substituting, merging, and dividing processes are performed between the processing devices 0 and I between the processing devices 2 and 3. Thus, the data (0, 1, 2), the data (3, 4, 5), the data (6, 7 8), and the data (9, 10, 11) are stored in the processing devices 0, 1, 2, and 3, respectively. Thus, the sorting process is completed.
Next, the sorting system will be represented by a general equation. Now assume that the number of processing devices is N=2.sup.n and that processing devices P are denoted by P.sub.0, P.sub.1, . . . , and P.sub.N-1. In addition, assume that data V to be sorted is divided into N and then each divided data is stored in each processing device. Moreover, assume that data stored in the processing device P.sub.i is V.sub.i.
The processing device P.sub.i independently sorts the data V.sub.i stored therein.
Next, the substituting, merging, and dividing processes are repeated between each processing device in the network. The substituting, merging, and dividing processes are performed log N times. In other words, this process sequence is performed for n stages. The n stages are denoted by S.sub.0, S.sub.1, . . . , and S.sub.n-1. Each stage S.sub.j has (j+1) sub stages. The (j+1) sub stages are denoted by s.sub.j0, s.sub.j1, . . . , s.sub.jj.
The substituting, merging, and dividing processes are performed once in each sub stage. In the sub stage s.sub.jk, the processing device Pi substitutes, for the vector stored therein, the vector stored in the processing device A (i, j, k); merges the vector V.sub.i which has not been substituted with the vector received; and divides the merged vector into a first half unit and a second half unit. In the dividing process, if function B (i, j, k)=0, the processing device P.sub.i stores the first half unit and discards the second half unit. On the other hand, if the function B (i, j, k)=1, the processing device P.sub.i stores the second half unit and discards the first half unit.
The functions A and B are given as follows. EQU A(i, j, k)=i.sym.2.sup.j-k ( 1) EQU B(i, j, k)=i.sub.j+1 .sym.i.sub.j-k ( 2)
where i.sub.j is the value (0 or 1) of b.sub.j of the binary notation of i (b.sub.n-1, b.sub.n-2, . . . b.sub.j, . . . , b.sub.1, b.sub.0).
After the process sequence for the sub stage s.sub.jj of the stage S.sub.n-1 has been completed, the sorted vector can be obtained.
In the case of the system having four processing devices, since N=2.sup.2 =4, the sorting process is completed in two stages.
FIG. 2 is a schematic diagram for explaining the order of process sequence of the stages for eight processing devices. Since the number of the processing devices is eight, N=2.sup.n =8, thereby n=3. Thus, the number of stages is three (S.sub.0, S.sub.1, S.sub.2). The stage S.sub.0 has one sub stage (s.sub.00). The stage S.sub.1 has two sub stages (s.sub.10 and s.sub.11). The stage S.sub.2 has three sub stages (s.sub.20, s.sub.21, and s.sub.22). The pair of processing devices which perform the substituting, merging, and dividing processes and the arrow direction (which represents which of first half unit or second half unit to be stored) for each sub stage depend on the above equations (1) and (2).
FIG. 3 is a schematic diagram used to explain a process sequence performed in each sub stage of a conventional system.
With reference to FIG. 3, the process sequence performed in each sub stage will be described. Now assume that the substituting, merging, and dividing processes are performed between processing devices X and Y in a particular sub stage.
Each processing device has a buffer which stores a vector. For example, in the stage 1, in the system having four processing devices shown in FIG. 1, the buffer of the processing device 0 stores a vector (2, 6, 10), whereas the buffer of the processing device 1 stores a vector (1, 7, 8).
First, vectors which are stored in two processing devices X and Y are substituted with each other. In the example shown in FIG. 1, the processing device sends the vector (2, 6, 10) to the processing device 1. On the other hand, the processing device 1 sends the vector (1, 7, 8) to the processing device 0. The processing devices 0 and 1 each store the received vector in another buffer thereof.
Thereafter, the processing devices 0 and 1 each merge the two vectors stored in the two buffers thereof. Thus, in the processing devices X and Y, the two vectors stored in the buffers are sorted. In the example shown in FIG. 1, after the two vectors have been merged, a vector (1, 2, 6, 7, 8, 10) is generated. Thereafter, depending on the arrow direction determined by the equation (2), either the first half unit or the second half unit of the merged vector is stored. When the arrow orients from the processing device X to the processing device Y, the first half unit is stored in the processing device X, whereas the second half unit is stored in the processing devices Y. In FIG. 1, the first half unit (1, 2, 6) is stored in the buffer of the processing device 0, whereas the second half unit (7, 8, 10) is stored in the buffer of the processing device 1. Thus, the substituting, merging, and dividing processes for one sub stage are completed.
As described above, in the conventional system, by repeating this process sequence (namely, the substituting, merging, and dividing processes) between two of a plurality of processing devices, vectors stored in all the processing devices are sorted.
FIG. 4 is a schematic diagram used to explain a conventional system.
As shown in the figure, two of a plurality of processing devices connected in a network are designated as a pair. Between such a pair of processing devices, vectors stored therein are substituted. Each processing device then merges a vector stored therein and another vector received from the other processing device, and divides the merged vector into the first and second half units. Thereafter, between another pair of two processing devices, the substituting, merging, and dividing processes are performed. By repeating this process sequence for all the processing devices, all the vectors are sorted.
However, the conventional system has the following problems.
First, the system which uses processing devices connected in a sorting-dedicated network is not practical. Specifically, in most applications, the sorting process is not used independently. Rather, the sorting process is used for arranging the results of other processes or as a pre-process thereof. Thus, it is not practical to construct a connection network dedicated to the sorting process.
In addition, in the conventional sorting system using a general-purpose connection network, the amount of communication performed between two processing devices for the vector substituting process is large, resulting in an increase of the load of the communicating process.
In other words, in the conventional system, when the number of processing devices is N=2.sup.n, the number of stages is n and the number of sub stages is also n since each sub stage requires a communicating process. The sorting process for all the stages requires the communicating process (1+2+. . . +n) times (namely, n (n +1)/2 times). Thus, the number of times communicating process is required is given by the following equation. EQU M=log N(log N+1)/2,
where M denotes the number of times the communicating process is required. The amount of communication is expressed by multiplying M by the number of elements of the vectors.
When N=4, the number of times the communicating process is required is 3 (M=3). When N=8, m=6. Thus, as the number of the processing devices increases, the value of M further increases. In other words, when N=256=2.sup.8, m=36. When N=1024=2.sup.10, M=55. In the conventional system, the communicating process requires a large amount of the resources of the system.
In the conventional sorting system used in the general-purpose connection network, since all the vectors which have been sorted are substituted between each processing device and then the substituted vectors are merged, the same merging process is redundantly performed in two processing devices.
Thus, the present invention seeks to overcome the above-described problems of conventional sorting systems.