The invention relates to a cache coherence apparatus for a multiprocessor system in which a plurality of processor modules are connected through a system bus and, more particularly, to a coherence apparatus for a cache of a multiprocessor system in which a plurality of processor elements with caches are connected in a processor module through a main storage and an internal common bus.
In recent years, since it is expected that the processing performance of computer systems will remarkably improved and the development of a multiprocessor system in which a plurality of processors are connected through a common bus is progressing. In the multiprocessor system, by using a superscalar type or a VLIW (Very Long Instruction Word) type as a processor, the processing performance of a sole processor is improving. The use of a cache mechanism largely contributes to the improvement of the performance. In such a cache mechanism, a primary cache is built in the processor and a secondary cache is provided between the processor and an external main storage. Using this configuration the hit ratio of the second cache is raised, thereby reducing accesses to the main storage and improving the performance. Further, as a common memory for a plurality of processors, a sole memory is not provided but local memories which function as a main storage are distributed and arranged on a unit basis for a predetermined number of processors. Common memory areas are distributed and arranged for a plurality of local memories and the common memories can be flexibly constructed in dependence on the number of local memory units according to a system scale.
FIG. 1 shows a conventional typical multiprocessor system. The system has two processor modules 1000-1 and 1000-2. The processor modules 1000-1 and 1000-2 have the same construction. For example, when considering the processor module 1000-1, it is constructed by processor elements 1100-1 and 1100-2, cache units 1200-1 and 1200-2, a common bus 1300-1, and a local storage unit 1400-1. The processor elements 1100-1 and 1100-2 have therein primary caches and are also provided with the cache units 1200-1 and 1200-2 as secondary caches on the outside. The local storage unit 1400-1 functions as each main storage by allocating particular physical spaces to the processor elements 1100-1 and 1100-2. At the same time, the local storage unit 1400-1 allocates a common memory space which is shared by all of processor elements, 1100-1 to 1100-4, of the processor modules 1000-1 and on the cache of the accessing source is updated to a 1000-2. In a manner similar to the above, the processor module 1000-2 side is also constructed by processor elements 1100-3 and 1100-4, cache units 1200-3 and 1200-4, a common bus 1300-2, and a local storage unit 1400-2. The common buses, 1300-1 and 1300-2, of the processor modules 1000-1 and 1000-2 operate as one bus connected via a back panel of a module casing. As for a cache control of the cache units 1200-1 to 1200-4 associated with a plurality of processor elements, 1100-1 to 1100- 4, when there is a read access in one of the processor elements and there is a mishit, a copy value of a read address is transferred to the cache by the access to the corresponding local storage. When the cache causes a mishit by a write access of the processor, a copy value of the corresponding local storage is transferred to the cache and is overwritten. In this instance, the value in the main storage and the overwritten copy value in the cache are different. Therefore, a cache coherence is maintained by a copy back which transfers the newest value in the cache to the main storage and updating the old value. on the other hand, in the case where the value of a certain local storage has been copied into a plurality of caches, the copy values in the caches other than the cache to which the write access was performed are invalidated. After that, the copy value in the cache of the accessing source is updated to a newest value and, subsequently, the newest value is copied back to the local storage, thereby maintaining the cache coherence.
However, since the multiprocessor system of FIG. 1 uses a construction such that all of the processor elements, cache units, and local storages are connected to one bus, there are the following problems. First, in order to maintain cache coherence, commands and data are transferred between the cache unit of the processor element which generated the read or write access and the local storage unit having the access address. Therefore, bus transfer requests to maintain coherence compete among a plurality of processor elements and the load of the common bus increases in accordance with the number of processors and high speed processing cannot be performed. Although the load on the bus can be reduced by providing a plurality of common buses, when the number of processors increases to ten or twenty processors, the system cannot cope effectively with it.
Each of the processor modules 1000-1 and 1000-2 is constructed on a module casing unit basis. Therefore, the common buses 1300-1 and 1300-2 are connected by a cable using connectors via a back panel of the casing. Thus, the line length of the bus is increased and the clock frequency of the bus cannot be raised due to electrical characteristics. For example, although the clock frequency can be set to 60 MHz in case of only one bus in the module, it is reduced to 40 MHz in case of the bus connection which shares the back panel.
In the multiprocessor system, a system bus particularly exerts a large influence on the performance. In the system bus, the data transfer speed and the bus usage efficiency as a ratio of cycles which are used for data transfer over the total bus cycles have to be raised. Otherwise, the system bus becomes a bottleneck and even if the number of processors is increased, the performance is not improved. To improve the data transfer speed, generally, the width of data bus is widened or an interface of a small signal amplitude is used, thereby raising an operating frequency of the bus. To raise the bus usage efficiency, it is possible to use a split type transfer system to yield a usage privilege to another bus unit for a period of time until a response is received after a request such as a memory access or the like was generated.