The present invention relates to a multiprocessor system, and more particularly to a control method for assuring the coherency of the main memory in a shared memory parallel computer system.
In recent years, computer systems, especially high end models, generally have a shared memory multiprocessor (SMP) configuration. Further, the number of the processors in a computer system sharing the same main memory has tended to be increased to enhance the performance of the entire system.
When a computer system has cache memory (hereinafter referred to simply as a cache), the computer must be controlled so as to assure consistency between data read from the cache and that read from the main memory. Especially in a shared memory parallel computer, if one processor has rewritten data stored in the main memory, the change in the data must be reflected in the copies stored in the caches of all the other processors. Such control is referred to as “main memory/cache coherency control”.
Conventional computer systems use the MESI protocol, etc. to perform coherency control. The MESI protocol manages cache lines in a cache by assuming that they can be in one of four states, namely “M” (Modified), “E” (Exclusive), “S” (Shared), and “I” (Invalid) states. (A cache line is a continuous memory region in a cache, and data in a cache is handled on a cache line basis.) In a computer using the MESI protocol, each processor generally determines whether to issue a cache coherency control request for a particular cache line based on its MESI state.
However, in the case of a computer system using a conventional cache coherency control technique such as the MESI protocol, each processor only knows the states of the lines in its own cache. This means that, for example, when a line (of data) that has missed the cache is fetched from the main memory, a coherency control request must be broadcast to all the other processors in the system.
Further, even when a line (of data) that has hit the cache is processed, the processor must broadcast a coherency control request, such as a cache invalidation request, to all the other processors in the system since the processor does not have any information indicating which one or ones of them have a copy of the data (line) in their cache.
Therefore, in computer systems which use only a conventional protocol such as the MESI protocol to perform cache coherency control, the number of transactions for the cache coherency control increases with increasing number of processors which share the same main memory. This means that a computer system having a large scale SMP configuration has a problem in that its performance is degraded due to the bottleneck created by such coherency control.
To address the above problem, the following document proposes a method for reducing the number of coherency control transactions using a directory system: “The Stanford FLASH Multiprocessor”, PROCEEDINGS OF THE 21ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (SPECIAL ISSUE ISCA' 21 PROCEEDINGS, Apr. 18-21, 1994), p. 302-313. In this method, each line in the main memory includes directory information indicating which processor(s) has a copy of the line in its cache. Each processor checks this directory information before broadcasting any coherency control transaction, and sends out a coherency control transaction to only lines and processors that actually require coherency control, thereby considerably reducing the number of transactions.
However, this method requires that each line include a directory entry for indicating whether its copy (or copies) exists in the system (indicating which processor(s) has a copy of the line). Implementing such a function requires a large amount of hardware, resulting in increased cost. Assume, for example, that the line size is 128 B (bytes) and the main memory size is 4 GB. In such a case, even if the directory size per entry is 1 bit, 32 MB (=4 GB/128 B*1 bit) of memory is additionally needed, which is a large overhead.
Generally, the directory is often implemented in main memory (DRAM) since it requires a large amount of memory. Therefore, the processor must first read the directory in the main memory each time it accesses the main memory. This leads to considerably increased access latency to the main memory since the latency for reading the directory itself is long. To solve this problem, directory information may be copied to the cache. However, such an arrangement further increases the amount of hardware, resulting in increased implementation cost.
To overcome these problems, Japanese Patent Laid-Open No. 10-240707 proposes a method of assigning each directory entry to a page, which is a larger continuous memory region than a line.
However, assigning each directory entry to a continuous memory region larger than a line makes it difficult for hardware to detect whether the cache stores a copy of the memory region indicated by a given directory entry. Therefore, this system requires support from software such as the OS to reset directory information.
Furthermore, generally, the execution time is longer when the reset function is implemented by software than when it is implemented by hardware, causing the problem of degraded performance. Implementing the reset function by hardware also presents its own problem: a cache tag must be read out a plurality of times to determine the presence or absence of a copy, which makes it difficult to reset directory information at high speed.