The present invention relates to an access request control scheme for a main memory shared by a multiprocessor system incorporating a plurality of processing units by using directory information indicating which processing unit holds specific data in an address of the main memory. More specifically, the present invention relates to a main memory access control scheme suitable for a parallel computer system with a distributed main memory for the processing units connected through a network capable of performing parallel data transfer.
In a parallel computer system, there is a well-known architecture having a main memory shared by a plurality of processing units (referred to as PUs herein), wherein each processing unit being provided for a cache.
In particular, the Japanese Laid-Open Patent Application No. 5-89056 (referred to as "reference #1" herein), "The Stanford dash Multiprocessor," IEEE Computer, March 1992, pp. 63-79 (referred to as "reference #2" herein) and the like have proposed a parallel computer system having physically distributed and logically shared (distributed shared) memory system for this type of parallel processor.
In these parallel computers, a main memory is distributed for each PU, and each PU is coupled with a network, such as multistage interconnection network, for transferring a plurality of data in parallel in order to provide a network throughput suitable to the number of PU's and in order to avoid the limit of the connectable number of PU's.
The main memory controller of the distributed shared memory scheme of the parallel computer of the Prior Art has been connected to, as documented in the reference #2, each data line of the main memory for each respective PU, and the directory structure indicating for which PU a data line is cached (which PU has a copy of that data line in its own cache) is stored in a dedicated memory for this specific purpose and is separate from the main memory.
When a command for maintaining the cache coherency is required to be sent, when shared data has been modified, the command is first sent to the main memory. Then, the main memory controller sends the command to a PU that is indicated by a directory associated with the main memory. At the same time, the contents of the directory are updated. When a PU writes to data, an invalidation command is sent to all of other #PU's indicated by the directory for that data. Then all copies cached in these PU's are erased. When a PU reads cached data, the read command is sent to one of the PU's indicated by the directory so that the cached data is provided from the cache of that PU to the PU requesting the read command.
By managing the caches by means of a directory structure, a command to maintain the cache coherency is sent only to the PU caching the appropriate data line. As the command is not sent to other PU's, the broadcasting to all PU's is not necessary. Thus, the management of the cache coherency of the main memory data distributed to each network-connected PU's may be efficiently performed.
In the reference #2, the directory for each data line is indicated by a so-called bitmap, having one bit indicating whether or not a data line is cached for each PU.
Another scheme has also been proposed, in which the number of PU's actually caching a data line is stored as directory, instead of bitmap. See, for example, "Directory-based Cache Coherence in Large-Scale Multiprocessors," IEEE Computer, 1990 June, pp. 49-58 (referred to as "reference #3" herein). The technique mentioned in this reference is called "limited pointer" scheme (or simply, pointer scheme), in which the PU number stored as directory is limited to a given number, such as eight.
Another scheme has been further proposed, in which the stored PU number, using this pointer scheme, is held in a location in the main memory other than the locations of data lines. See, for example, "The Stanford FLASH Multiprocessor," Proc. of the 21st Annual International Symposium on Computer Architecture, 1994 Apr. 18-21, pp. 302-313 (referred to as "reference #4" herein).
Since the prior art reference #2 requires to hold, for each line of main memory, a directory indicating which PU caches which line in a memory, there is a disadvantage of having a large amount of memory for the directory. In the example of 16 PU's of the Prior Art mentioned above, given that the system has 16 PU's and that a machine is 1 word=8 bytes, 1 line=4 word and a directory of 16 bits is needed for one line of cache (4.times.8.times.8=256 bits), the amount of directory becomes 1/16 of the size of the main memory, and the cost of hardware requirement is high. Thus, the more the PU's the computer system has, the more the directory cost increases. For example, if a machine of 256 PU's holds directories as bitmap pattern mentioned above, a directory of 256 bits is required for one cache line. This amount will correspond to that of main memory.
The pointer scheme mentioned in the reference #3 requires a less amount of dedicated memory for directories than that of the bitmap style directory. However, the amount may not be negligible for a line size since a plurality of pointers must be held.
Another pointer scheme mentioned in the reference #4 uses main memory as the storage of directories to eliminate the requirement of dedicated memory. However, this scheme has a problem in that the memory space for data storage may be decreased, since the amount of main memory used for the storage of directories cannot be neglected.
As set forth above, if the distributed shared memory is implemented by using the directory scheme of the Prior Art, the cost of hardware requirement will significantly increase, because the amount of memory used for the storage of directories becomes large when compared with the amount of main memory for data storage.