1. Field of the Invention
The present invention relates to a tightly coupled multiprocessor system in which a plurality of processors (CPUs) are interconnected through a shared bus.
2. Description of the Background Art
A conventional shared-memory multiprocessor system has a general configuration as shown in FIG. 1, in which a plurality of processors (processor-1 to processor-n) 1000-1 to 1000-n, each of which having a CPU and a cache memory, and a plurality of shared memories (memory-1 to memory-m) 1001-1 to 1001-m are interconnected through a shared bus 1002 or a connection path using switches, such that each of the shared memories 1001-1 to 1001-m can be accessed from any one of the processors 1000-1 to 1000-n equally. Consequently, the parallel programs executed on such a shared-memory multiprocessor can make memory accesses without distinguishing each of the shared memories 1001-1 to 1001-m, and the variables shared by a certain process and the other process can be accessed from that certain process regardless of whether that other process is executed over a plurality of processors or not.
Now, the execution of the parallel programs not only requires accesses to the shared variables, and in fact a majority of the memory accesses required in the execution of the parallel programs are those for the local variables of each program, so that the memory access performance for the local variables has the dominant influence on the overall performance of the multiprocessor system.
In the shared-memory multiprocessor system having a configuration of FIG. 1, even for an access to a local variable not shared among the processors, when a cache miss occurred, a memory access command is transmitted to the shared bus 1002 just as in a case of an access to a shared variable, so as to refill the cache memory by making access to the shared memories. This scheme is advantageous in that there is no need to pay attention to whether the variable is shared among the processors or not, so that the constraints to be imposed at a time of programming the parallel programs can be reduced. However, in this scheme, it is impossible to improve the memory access performance for the local variables.
In order to obtain a high system performance for a tightly coupled multiprocessor system, it is important to reduce the traffic on the shared bus so that an increased number of processors can be coupled together by the shared bus, but in the shared-memory multiprocessor system having a configuration of FIG. 1, the shared bus must be used even for the accesses to the local variables, so that it is practically impossible to couple many processors together in this system, i.e., the extensibility of the system is severely limited.
Now, in recent years, due to the significant progress made in the semiconductor LSI technology, it has become possible for a general purpose computer to be implemented by integrating not just a central processing unit (CPU) but also peripheral devices such as a cache memory and a memory management unit (MMU) on a single chip called a microprocessor. In such a microprocessor, the integration level of 4 or 16 Mbits per chip has been realized for a DRAM to be used for a main memory, and the researches aimed at even higher integration level are in progress. On the benefit of such technological advances, a small scale computer called a work station can be implemented entirely on a single processor board including control units for the peripheral devices such as disk devices and LAN data lines. In such a work station, the high performance level can be realized by tightly coupling the memory and the CPU, to such an extent that the processing performance of the CPU itself can be made comparable to that of the current generation of the general purpose computer. As such, by utilizing the present day technology, it is possible to practically implement the CPU and the memory of a processor on the same processor board.
An alternative conventionally proposed shared-memory multiprocessor system called a distributed shared-memory multiprocessor system has a general configuration as shown in FIG. 2, in which shared memories are distributed over a plurality of processor boards as main memories of the CPUs on the processor boards. Namely, in this configuration of FIG. 2, each one of the processors 1010-1 to 1010-n has a CPU, a cache memory, a main memory, and a bus driving buffer, all of which are implemented on a single processor board, and is coupled with the other processors by a shared bus 1012 via the bus driving buffer provided on each processor board.
In this configuration of FIG. 2, the local variables to be used by the CPU on each processor board are allocated to the main memory of the same processor board, such that the memory accesses to the local variables can be realized without using the shared bus 1012. The shared variables are allocated to the main memory of one of the processor board whose CPUs are sharing these shared variables, such that the memory accesses to the shared variables can be achieved through the shared bus 1012.
However, the already existing parallel programs are programmed without distinguishing the shared variables and the local variables, so that a clear distinction between the shared variables and the local variables required by this distributed shared-memory multiprocessor system of FIG. 2 imposes a considerable constraint on the programming of new parallel programs. In addition, in order to realize this distributed shared-memory multiprocessor system of FIG. 2, the CPU and the memory must be implemented on the same processor board in a form of a microprocessor, but most of the already existing microprocessors do not possess any means for making accesses by distinguishing the local variables and the shared variables.
On the other hand, as far as the memories are concerned, it is impossible to maintain the consistency among the memories unless the execution of the access to the cache memory of the processor on the other processor board by using the shared bus is notified to the other processors whenever it is impossible to deny the possibility that it is the access to the shared variable, so that there arises a need to transmit every memory access command to the shared bus in practice unless it is possible to make a clear distinction between the shared variables and the local variables.
Thus, it is impossible to reduce the traffic on the shared bus in this distributed shared-memory multiprocessor system of FIG. 2, and consequently there remains the problem concerning the severely limited extensibility of the system.