The present invention relates to a tightly-coupled multiprocessor system which comprises plural processor units sharing a main memory and connected by an interconnection network.
In many prior art tightly-coupled multiprocessor systems, a shared bus or a network (parallel transfer network) which can transfer plural messages in parallel is used for an interconnection network which connects the processor units and a main memory shared by them. In the latter method, the cache directory method is known as one of the methods to maintain cache coherency between the processor units. For instance, see L. Censier and P. Feautrier, "A New Solution to Coherence Problems in Multicache Systems," IEEE Transactions on Computers, Vol.C-27, No. 12, pp. 1112 to 1118 (1978) (hereinafter referred to as the reference document 1).
In this method, a directory is used which collectively holds, in correspondence to all areas of a cache line size in the main memory area, cache statuses of data of respective areas in all processor units. The cache line transfer requests/invalidate requests, etc., are sent through a parallel transfer network only to specific processor units designated by the directory. Therefore, there is an advantage that an unnecessary coherent read request is not sent to the caches of the other processor units.
In this method, however, cache miss latency becomes long, because data transfer is executed three times to one coherent read request. Concretely, a memory read request is sent from a processor unit requesting data to the main memory through the parallel transfer network. The main memory inspects the directory. When another processor unit has updated the data, the main memory issues a line transfer request to the cache in that another processor unit. That another processor unit transfers the data to the request source processor unit according to the line transfer request.
Another cache coherency maintenance method is a snoop cache method. Refer, for instance, to Ben Catanzaro, "Multiprocessor System Architectures," Sun Microsystems, pp.157 to 170, 1994 (hereinafter referred to as the reference document 2) or Don Anderson/Tom Shanley, "PENTIUM PROCESSOR SYSTEM ARCHITECTURE Second Edition," MINDSHARE, Inc., pp. 61 to 91, 1995 (hereinafter referred to as the reference document 3). In this method, each processor unit controls the cache status of the data held in the cache of its own. Maintenance of coherency is achieved by communication between the request source processor unit of data and all other processor units.
There are various methods in the snoop method but the typical one is as follows. That is, a processor unit which requests data sends shared bus a coherent read request. Each processor unit receives this coherent read request on the shared bus, checks the cache status of data designated by this request, and notifies the request source processor unit of the status. The main memory transfers the data designated by the request to the request source processor unit. However, when either one of the processor units has updated the data which the request designates, that processor unit transfers the data to the request source processor unit in place of the main memory.
Therefore, the snoop method is superior to the directory method in that read processing completes by transfer of two times in any case, that is, transfer of a coherent read request from a request source processor unit and transfer of the requested data from either the main memory or a processor or unit. In the snoop method, however, coherent read requests are sent from all the processor units to the shared bus. Therefore, the busy rate of the shared bus increases when the number of processor units increases. Here, the busy rate is defined as a ratio of the number of requests effectively acceptable per unit time to the maximum number of requests acceptable per unit time. As a result, the wait time for arbitration of the shared bus increases. Therefore, the problem occurs that the time until necessary data arrives at a processor unit, that is, cache miss latency increases. Moreover, it is necessary in this method for even a cache without the shared data to respond to the coherent read request on the shared bus and to search for the cache tag. Therefore, the busy rate of the cache tag increases, too and the cache miss latency increases in addition.
In Japanese Laid Open Patent Application No. HEI 04-328653 or its corresponding U.S. Pat. No. 5,386,511 (hereinafter referred to as the reference document 4), a method is disclosed in which only the coherent read request is transferred by using the shared bus and other information such as the memory data is transferred by using an interconnection network which can transfer messages in parallel.