This invention relates to a data transfer device for transferring data between processor elements and for transmitting and receiving data between any one of the processor elements and outside in a multiprocessor system for parallel processing using a plurality of processor elements.
Recently, accompanied by large-scaled data processing, high-performance computer systems are required. In a parallel computer, a plurality of processor elements share processings in order to enhance the performance. In general, the parallel processing requires data transfer between the processor elements, associated by proceedings of the processings in the plural processor elements. Various kinds of interconnection networks (data transfer networks) are proposed for communication among such processor elements. Among them, a crossbar-type network is a complete interconnection network which can perform communication among arbitrary processor elements by one time data transfer.
FIG. 22 shows a construction of a conventional multi-processor system in which the plural processor elements (PE) are connected by a crossbar-type network. In the figure, reference numeral 50 indicates a processor element on transmission side, 55 indicates a processor element on receiving side, 60 indicates a data transfer control device on transmission side, 65 indicates a data transfer control device on receiving side, and 70 indicates a buffer unit as a data transfer channel (joint node).
How to achieve the crossbar-type network logically expressed as in FIG.22 on an actual hardware depends on respective parallel computers. Also, a sequence for accessing the network according to the proceeding stage of the processing depends on the contents to be processed. FIG.23 shows an example of the buffer unit 70 of FIFO-type (first-in-first-out buffer 71). The processor element 50 on transmission side can transmit data to the channel of the FIFO buffer 71 during the time when the FIFO buffer 71 has a vacant. The processor element 55 on receiving side reads the data from the non-vacant FIFO buffer 71. Accordingly, different from a case where the joint node is composed of a mere switch, setting of the joint condition of whole network is unnecessary.
Shown in FIGS. 24 and 25 is an example of generation of a buffer unit address (channel number) for specifying a target address to which a data is to be transmitted in the construction shown in FIGS. 22 and 23. FIG. 24 shows a construction of a conventional address generation circuit 61 built in the data transfer control device 60 on transmission side. In FIG.24, the address generation circuit 61 includes an n-bit pointer 62 and a +1 adder 63. Under this construction, when data is transmitted from the processor elements 50 on the transmission side to the processor elements 55 on the receiving side, a buffer unit address A held in the n-bit pointer 62 is incremented according to an address update requirement signal CNT per data transfer to sequentially specify target channel numbers, as shown in FIG.25. This is called "burst transfer", which requires only pulses as the CNT signal without additional time for specifying the channels. Thus, the data transfer rate is high.
In the address generation circuit 61 in FIG.24 in the conventional data transfer device, if the n-bit pointer 62 has two bits, only n-power of 2 is counted, such as 0, 1, 2, 3, 0, 1, 2, 3 . . . Therefore, variation range of the address is fixed and the address generation is limited to sequential generation.
Consequently, the multiprocessor system using the data transfer network in which the size is fixed by such a reason cannot optimize the network size to contents to be processed, so that the parallel processing rate is lowered. Further, in the case where the processor elements are grouped and data transfer with a different purpose at every group is to be conducted, the burst transfer is not available. This means that another data transfer method with low transfer rate must be employed.
Moreover, in the multiprocessor system using the data transfer network which generates only sequential addresses as mentioned above, for example, three-dimensional array data are processed with low efficiency, because skip address values often occur in case of data distribution and collection in the multiprocessor system.