1. Field of the invention
The present invention relates to a parallel computer system including processing elements and, in particular, to a parallel computer system including a plurality of processing elements and performing data transmissions among the processing elements.
2. Description of the prior art
In recent years much research has been conducted toward the realization of practical parallel computer systems. Especially, with the advancement of the semiconductor technology, a communication control unit and a data processing unit can now be realized as a processing element on a single LSI chip, and there are a number of researches where a parallel computer is realized by connecting a number-of such processing element LSIs.
An example of such a parallel computer, a large-scale data-flow computer named EDDEN (Enhanced Data Driven ENgine) having maximum of 1024 one-chip processing elements connected, is currently under development as disclosed in pages 1048-1049 of the book 2T-2 of the proceedings of 38th conference o Information Processing Society. In such a large-scale data-flow computer, all the communications among processing elements are bi-directional, and the distance between any two processing elements in such data communication is can be made minimum.
Furthermore, it has been proposed to use bi-directional parallel communication lines for communication links connecting processing elements, for this would make it possible to use the same communication link for both transmitting and receiving of data and would reduce the number of signal lines used to connect processing elements in a large-scale data-flow computer.
As done in the EDDEN system above, the implementation of the bi-directional communication among the processing elements can not only improve communication efficiency but can also keep the uniformity of the communication network. Furthermore, by having transmissions in both directions share the same communication line, the number of input/output terminals of a processing element can be reduced, and a processing element can then be realized on a single LSI chip.
In a conventional parallel computer, when a packet consisting of a plurality of words is being transmitted in one direction, there occurs a state where the packet may exist over multiple processing elements. If another packet being transmitted to the opposite direction is similarly existent over multiple processing elements, both packets would come to a halt to wait for the completion of other packet's transmission. Therefore, a deadlock in which the both packets will never be able to move again occurs.
Such a deadlock can be avoided by equipping each processing element with a data buffer having the size equivalent to the number of words in a packet. This will cause the entire one packet to be stored within the data buffer of a processing element and will avoid the spreading out of a packet over multiple processing elements as mentioned before. In other words, if the number of words in a packet is fixed, deadlocks can be avoided by equipping each processing unit with a data buffer whose size is equivalent to the number of words in a packet.
However, when a structure packet, e.g., one-dimensional vector, having large number of words of variable sizes are transmitted between processing elements, the above-mentioned deadlock cannot be avoided even when each of the processing elements is equipped with the data buffer as described before. This is because when the number of words in a structure packet exceeds the size of the data buffer in each of the processing elements, a packet again becomes spread out over multiple processing elements.