The present invention relates to a multiprocessor system. More particularly, the present invention relates to a multiprocessor system in which a plurality of processors are interconnected to a plurality of cache memories by interconnection apparatus which maintains coherency between the cache memories.
Many conventional shared memory multiprocessors have a common configuration in which several processor units and memory units are connected through a bus and which employs a snoop cache scheme to guarantee the coherence among the contents of the caches in the processor units.
Examples of such computer systems can be found in "Ben Catanzaro, Multiprocessor System Architectures, Sun Microsystems, 1994" (referred to as reference literature 1), or "Don Anderson/Tom Shanley, PENTIUM PROCESSOR SYSTEM ARCHITECTURE, Second Edition, MINDSHARE INC., 1995" (referred to as reference literature 2). In these conventional examples, two or more processor units and memory units are connected by a single bus. The processor units are interconnected by a cache coherency check result bus, which has a shared signal and a dirty signal. The operation flow is as follows. A memory access request that requires checking the status of other caches will be referred to as a coherent read request. The returning by each cache of information concerning the status of the corresponding cache line in response to a coherent read request to the source of the coherent read request will be referred to as a coherency status report.
(1) A processor unit requesting certain data puts a coherent read request on the bus connecting the processor unit and the main memory.
(2) When the processor units find the coherent read request on the bus, they send their replies out on a cache coherency check result bus interconnecting the processor units. Each processor unit, when for example it holds the requested data in a clean state, asserts a shared signal. When it holds the requested data in a dirty state, it asserts a dirty signal. The requesting source checks the shared signal and the dirty signal at predetermined cycles. The predetermined cycles differ from one conventional system to another. In the case of the literature 1, the predetermined cycles are fixed cycles after the coherent read request has been sent on the bus and, in the case of the literature 2, the cycles extend until the data from a memory is returned. If the shared signal is asserted, the requesting source judges that at least one other processor unit shares the data and determines the next state of its own cache accordingly. If the dirty signal is asserted, it judges that at least one other processor unit holds the latest data and determines the next state of its own cache and the data sending source accordingly. This processing of determining the state of its own cache or determining the latest data sending source based on the coherency status reports from a plurality of processor units is referred to as summation of coherency status reports.
(3) The memory unit sends the requested data to the coherent read requesting processor unit.
(3') If one of the processor units has already updated the requested data, the processor unit in place of its memory unit sends the updated data to the coherent read requesting source.
This scheme of summing the status reports from a plurality of processors by using wired logic of the bus will be referred to as a bus summary scheme.
Japanese Patent Laid-Open No. 281956/1995 (referred to as reference literature 3) discloses a coherency status report sum-up scheme for cases where a plurality of coherent read requests are executed simultaneously in an overlapping manner. In this conventional scheme, a plurality of processor units and one memory unit are connected by a single bus and the processor units are each connected with the memory unit by separate coherency status report lines. The operation flow during the coherent read is as follows:
(1) A processor unit requesting certain data sends a coherent read request out on a bus connecting the processor units and the memory unit.
(2) Each processor sends its coherency status report to the memory unit through the coherency status report line. The memory unit sums up the coherency status reports sent from the processors to determine the next state of the cache of the coherent read requesting source.
(3) The memory unit sends the requested data to the coherent read requesting processor unit. At the same time, the memory unit reports the next state of the cache to the coherent read requesting processor unit through the status report line provided on the bus.
(3') If any of the processors has already updated the requested data, that processor unit instead of the main memory sends the updated data to the coherent read requesting processor unit.
The above-described scheme will be referred to as a unit centralized summary scheme.
In realizing the snoop cache scheme, in the case of the above conventional example, it is assumed that the coherent read request is distributed through the bus connecting the processor units and the memory unit. Although this apparatus is effective in connecting a small number of processors at low cost, the bus traffic increases as the number of processor units or the main memory units increases, making the performance enhancement difficult. When building a large-scale multiprocessor system, the number of units to be driven increases and the physical size becomes large, rendering the operating frequency difficult to improve. To deal with this problem, Japanese Patent Laid-Open No. 138782/1997 (referred to as literature 4) discloses a method of performing snoop by using, instead of a bus, an interconnection network that can transfer addresses and data parallelly. Specifically, a crossbar network. This conventional example, though it discloses the method of distributing the coherent read request, does not describe the method of sending a coherency status report and the method of summing the cache coherency check results.
Of the above conventional examples, the bus summary scheme has difficulty improving the operation frequency because the coherency status report are sent through the bus. In systems where multiple coherent read requests are executed simultaneously in an overlapping manner, the next cache coherency check result cannot be sent out until the summary of cache coherency check results is completed, limiting the number of coherent read requests that can be overlapped. The unit centralized summary scheme cannot be applied to cases where there are a plurality of main memory units or where a plurality of main memory control units are employed to enhance the throughput.
Further, none of the above-described conventional systems can be applied to the snoop scheme using an interconnection network such as the crossbar network described above.