1. Field of the Invention
This invention relates to tightly coupled multiprocessor systems having a plurality of processors provided with respective private caches and having a shared memory space. More particularly, the invention relates to a technique, used in a multiprocessor employing a snoopy cache technique for maintaining consistency of data among caches, to largely increase the memory-bus transfer bandwidth.
2. Prior Art
Conflicts in access to a shared memory is the most serious bottleneck that prevents an improvement of the system performance in a multiprocessor system of a shared memory type. In order to reduce the bottleneck, techniques using additional private caches provided for respective processors and thereby decreasing the required bandwidth for the shared memory are often used. Further a technique for maintaining the consistency of data among the additional caches, or "snoopy cache" technique is well known. In this technique, each cache always monitors memory access which occurs on the shared bus (the "shared bus" herein means a communication medium to which a plurality of resources are connected and which is concurrently shared by these resources), and performs appropriate operations, if necessary, to a corresponding cache block for maintenance of the consistency of data in terms of other caches and the main memory. Such consistency operations are implemented in hardware. This technique is excellent because the maintenance control of data consistency is performed easily and at a high speed, and it is accordingly broadly adopted. However, the "snoopy cache" technique cannot resolve one significant problem, i.e., bus bottleneck, because it is based on a shared bus architecture. The "snoopy cache" technique is accordingly practical to only small-scaled parallel systems including about ten plus several processors.
On the other hand, as a technique for essentially solving the bus bottleneck problem, an interconnection network (the "interconnection network" herein means a communication medium to which a plurality of resources are connected and which connects them by one to one, or by one to some, by means of a switch) has been studied for a long time. In a multiprocessor system coupled by an interconnection network, the number of coupling links increases with the number of processors constituting the system. Therefore, the interconnection network technology ensures a transfer bandwidth which is proportional to the number of processors, and makes it possible to realize a large-scaled parallel system including hundreds of processors. However, it is impossible for each private cache added to each processor to monitor all memory access by other processors. Therefore, it is theoretically impossible for such a system to perform control of data consistency by hardware implementing the "snoopy cache" technique. Under these circumstances, it is usual to give up consistency control by hardware but rely on software to perform consistency control. In this approach, caches are controlled by software so that copies of the same memory address will never be possessed concurrently by a plurality of caches. More specifically, under control of software protocol, corresponding copies in caches are invalidated by software instructions at an appropriate time to ensure that only one cache possesses the copy at a point of time. Drawbacks of this technique are the increase in load imposed to software and the decrease in performance caused by static invalidation by software instead of dynamically optimizing the use of caches by hardware.
Next, as a prior art technique related to the present invention, a technique combining a snoopy bus and an interconnection network (Bhuyan, L. N.; Bao Chyn Liu; Ahmea, I. "Analysis of MIN based multiprocessors with private cache memories," Proceedings of the 1989 International Conference on Parallel Processing, 8th to 12th August, 1989, pp. 51-58) is discussed briefly. In this technique, a snoopy bus is provided in addition to an interconnection network. Memory access that requires communication among caches for control of data consistency is processed through the snoopy bus, and normal memory access that does not require communication among caches is processed through the interconnection network. In order to decide whether the communication among the caches is required, a table storing conditions of all shared copies in the system is added to each cache. In this technique, the upper limit of the transfer bandwidth is determined by one of the shared buses used for access to shared data and the interconnection network used for access to particular data, selected depending on which is saturated earlier. Therefore, the upper limit of the transfer bandwidth in this technique largely depends on the characteristics of a program to be executed. It is reasonable to consider that, in a multiprocessor system using a snoopy cache technique well designed so as to significantly decrease the cache error ratio, a fraction of the whole access requests occurring on the system bus would be access requests generated by communication among caches for control of data consistency. Therefore, this technique merely realizes a transfer bandwidth several times wider than the bandwidth realized by only the shared bus coupling technique. This technique also requires that each cache should have a management table that describes conditions of the entire system in order to make it possible to locally determine whether access using the shared bus is required or only access using the interconnection network is required. In addition, the control mechanism of this technique becomes complicated because it must control both the shared bus and the interconnection network by using the table.