1. Field of the Invention
The present invention relates to a means of achieving high functionality and high speed operation in the data transfer component of parallel processing systems, which are widely anticipated in the computer field for high speed processing applications.
2. Description of the Prior Art
Widespread use of large-scale mathematical simulations has significantly increased demand for higher operating speeds in computer processing systems. Parallel processing systems have been developed as one of the most promising future supercomputer technologies, and various systems have been described in the literature.
In a parallel processing architecture, however, data is transferred between processor elements with significantly greater frequency, and the performance and functionality of the data transfer operation significantly affects overall system performance. More specifically, the greatest problems faced in improving the performance of a parallel processing computer are the performance of the individual processors, the software, and the processor-processor data transfer capacity and functionality. This has led to numerous proposals relating specifically to transferring data between processors.
A typical parallel processing system according to the prior art is described below with reference to FIGS. 10 and 11, a block diagram of the conventional parallel processing system and a diagram of the data packet configuration, respectively. It is to be noted that this device has been proposed in Japanese Patent Laid-Open No. S63-124162.
As shown in FIG. 10, this device comprises row crossbar switches 50a and 50b, column crossbar switches 51a and 51b, and element processors 53a-53d. Each element processor 53 comprises input and output ports to row and column crossbar switches 50 and 51. Each data packet (FIG. 11) comprises a header, which contains two switch addresses EW and SN and a routing reset bit R, and a data area.
The operation whereby a packet is transferred from one element processor 53a to another element processor 53d in this conventional parallel processing system is described below.
The packet is transferred in sequence from the element processor 53a to the row crossbar switch 50a, element processor 53c, column crossbar switch 51b, and then to the element processor 53d. The switch addresses EW and SN specify the column and row, respectively, for this operation. In this example both addresses are set to 1. If an error is detected on this route, the routing reset bit R is set to 1, and the packet is resent. If, in this example, an error occurs in the intermediate element processor 53c, the packet is sent the next time from the element processor 53a to the column crossbar switch 51a, element processor 53b, row crossbar switch 50b, and then to the addressed element processor 53d. It is therefore possible to transfer data packets between element processors 53 even if an error occurs in one of the element processors 53 used for routing.
However, the following problems are presented by this configuration.
First, Japanese Patent Laid-Open No. S63-124162 describes only the method of sending data (called "storing") from one element processor to another element processor, and does not describe the process whereby one element processor reads data (called "loading") from another element processor. The loading operation, however, can be more easily handled directly in software, making the loading operation preferable because of the greater flexibility permitted in the software. If both loading and storing operations are supported, however, the memory distributed among each of the processor elements can be freely accessed from any part of the architecture. This results in more flexible software, and a system with higher general utility.
A further drawback is the need to use a single, common packet length throughout the system because there is no packet length information contained in the header. This means that different length packets cannot be handled in this system.
In addition, broadcasting data from one element processor to all other element processors is only possible by addressing the data individually to each of the other element processors.
Finally, this architecture requires data locking measures when a large number of packets is transferred. This is not declared in the application.