The invention relates generally to a parallel computer and more particularly to a parallel computer where the number of the processors connected to a cluster bus can be rearranged during the operation of the computer.
FIG. 11 is a block diagram of a conventional parallel computer which is disclosed in the "Parallel circuit simulation machine Cenju", Nakata et al, Information Processing, Vol. 31, No. 5, Total No. 303, May 15, 1990, pp. 593-601. In FIG. 11, the parallel computer "Cenju" has processors (PE1-PR64). Eight processors connected to each cluster bus 2 form a cluster 11. Cluster buses 2 connects each PE in the cluster 11. A multi-connection network 3 connects each cluster through the network processors (NWP) 4. The network processors (NWP) 4 and the network adapters (NADA) 5 support the data transfer between the clusters 11. In this system, the memories use a distributed shared memory system. The memory is located in each PE and each memory has unique address in the system.
FIG. 12 is a conventional block diagram of the cluster 11 in FIG. 11. In the figure, there are eight processing means 9 in a cluster. Each distributed shared memory 8 at each processing means 9 has unique address in the system and operates as a local memory. Cluster bus 2 connects the processors 1 and the distributed shared memories 8 in the cluster 11. The processor 1 which is connected directly to the cluster bus 2 is called as a "owner processor" and the processor 1 which is connected to the cluster bus via the distributed shared memories 8 is called as a "non-owner processor".
The operation of the conventional parallel processor will be described hereinafter. In the conventional parallel processor system of FIG. 11, the data transfer between the processors 1 is executed via the distributed shared memories 8. A particular processor 1 can directly access to the distributed shared memories 8 of the other processors via cluster bus 2, if the other distributed shared memories 8 are located in the same cluster. If the desired distributed shared memories 8 are located in an other cluster, the processor 1 can access the other distributed shared memories 8 in the other clusters, via cluster bus 2, network processors (NWP) 4, network adaptor (NADA) 5 and a multi-connection network 3. Both above transfers requires the same hardware setup and the same application program, supported by the basic software, but the actual rate of accessing data access is very different between them.
The operation of the system of FIG. 12 will be described hereinafter. During the operation of the conventional parallel processing computer 9, shown in FIG. 12, each processor 1 is able to access directly and at a high speed the data stored in its own distributed shared memory 8. If the processor 1 desires to access data stored in other distributed shared memories 8, it is necessary to access the distributed shared memories 8 via the cluster bus 2. Each processor 1 has two port memories; one for direct access to its own distributed shared memory 8 and the other memory, connected to the cluster bus 2, for accessing other distributed shared memories 8 via the cluster bus 2.
As describe above, the processor 1 has two port memories. The processor 1 is able to access other distributed shared memories 8 without disturbing its access to its own memory. But if a plurality of accesses are carried out at the same time between the non-owner processors 1. The order of accession on the cluster must be meditated since the processors 1 compete for access with each other. Accordingly it may be necessary for a processor 1 to wait for an opportunity to access other distributed shared memories. Since computing and data transfer is carried out by time sharing in one processor 1 of the system, an increase in the numbers of data transfers will influence the computing time and degrades the performance of the system.
In the conventional parallel computer, the parallel processes needed to perform a data transfer in a closed cluster are different from parallel processes between clusters. Therefore, if the load from the application program, which allocates the clusters changes, the program can not adapt flexibly to the allocation of clusters. Accordingly, the processors 1 in a cluster 11 have to wait for a chance to make connection without being connected, or have to use other processor 1 in other cluster 11 by degrading the data transfer performance.
As the cluster numbers or the process numbers between the clusters 11 increase, so does the access numbers increase and the competition of the clusters occurs in the multi-connection network 3. Sometimes the processors 1 are able to smoothly access the other distributed shared memories 8 via the multi-connection network 3, but at the other time, the processors 1 are not able to smoothly access other distributed shared memories 8 via the multi-connection network 3 and must wait until the multi-connection network 3 become available. Therefore, the processing period may be different for the same process. In a real time processing system, if the processing period is different for the same process, different results may be obtained for the same process. Accordingly it is desirable that the processing system is able to complete the process in the same period if the process is the same.
As described above, in the conventional parallel computer shown in FIG. 11, there are many problems such that the processor 1 are not used effectively. The real time processing can not be assured since the clusters do not operate independently. Data transfer efficiency is degraded as a result.
In the conventional parallel computer shown in FIG. 12, non-owner processors 1 must wait to access the distributed shared memories 8 if a plurality of accesses are executed at a same time since the competition occurs between the cluster buses 2. If the waiting control is carried out by the hardware controller or software controller, each processor 1 has excessive overhead. Since data transfers are carried out when data is generated or when data is needed, data transfers are sometimes concentrated. It causes problems that the processors 1 are not used effectively or the data transfer efficiency of the data buses is degraded.
It is a primary object of the present invention to provide a parallel computer which is able to reconstruct the clusters during operation of the computer. As a result of changing the processors 1 numbers in the cluster in response to the load variation of the application programs which are allocated to each cluster, the resources of processor 1 are able to be used effectively. The real-time for processing and reproductivity can be assured because od the independence between the clusters.
It is further object of the present invention to increase the data transfer efficiency of the cluster buses by decreasing the transfer overhead for the processor 1 by using a separated data transfer processor which controls the transfer of data between processors 1 and distributed shared memories 8 via cluster buses 2, by increasing the degree of the freedom for selecting the data transfer timing which is independent of the time at which the data is generated and when data is needed.
It is further object of the present invention to increase the data transfer efficiency of the cluster buses by decreasing the transfer overhead for waiting the synchronous operation in the processors 1.
It is further object of the present invention to provide compilers which produce the transfer programs needed for the new architecture. Since the compiler extracts the data transfer program automatically from the arithmetic program, the programmer can write programs without any knowledge of the architecture of the system.
It is still further object of the present invention to increase the data transfer efficiency of the cluster buses 2 by reducing the overhead during a normal data transfer. An interruption signal is generated only when there is no data in FIFO 33 for the transfer processor 31 to access. As a result, the transfer processor starts the transfer operation assuming that data is ready to be transferred and waits for the data only when the data transfer has failed.