1. Field of the Invention
The present invention relates to a distributed shared memory type multiprocessor system configured using multiple cells. In particular, the present invention relates to a coherency technology in distributed shared memory type multiprocessor systems having multiple cells.
2. Description of the Related Art
In the field of computer systems, “a multiprocessor system” consisting of multiple processors is known. Among the multiprocessor systems, those which are constituted of multiple cells (or multiple nodes) wherein multiple cells are connected by buses and switches are well known. Each of the multiple cells independently contains a processor and main memory, and a “shared memory” is configured logically with all main memories (For example, referred to the Japanese Patent Laid Open Nos. 2000-67024, 2000-259596, 1994-110844, 1999-219343, and 2003-216597). Memories are distributed and shared by multiple processors so that the multiple processors may be also called a “distributed shared memory type (DSM) multiprocessor system” or “distributed shared memory type architecture”.
FIG. 1 is a block diagram showing a configuration of a distributed shared memory type multiprocessor system 1. The multiprocessor system 1 shown in FIG. 1 is composed of multiple cells C1 to Cn (n is an integer showing the number of cells) and a cross over switch 9 mutually connecting these multiple cells C1 to Cn. Each cell Cj (j is an integer that is greater than 1, but less than n) has at least one CPU (microprocessor) 2-j-1 to 2-j-m (m is an integer showing the CPU number within each cell) and main memory (local memory) 5-j. Each of the CPUs 2 loads at least one cache memory (stored-in-cache) 7. A shared memory is composed of all of the main memories 5-1 to 5-n that have been distributed and the shared memory can be referred by all CPUs 2.
In this distributed shared memory type multiprocessor system 1, while using cache 7, each CPU 2 accesses the data stored in the main memory 5-1 to 5-n, and also performs updating. When using a write back method for data updating, the updated data remains in the cache 7 so that there is a possibility that the latest data may not be present in the main memory 5-j. At this point, if multiple CPUs 2 try to reference/update the identical memory area, correct results cannot be obtained due to incoherency of the data. Thus, in the multiprocessor system, it is important to assure consistency of the data (referred to as “cache coherency”) in order for each CPU 2 to enable accessing the latest data. One should consider the fact that it is important to maintain coherency among multiple cells C1 to Cn as well as maintaining coherency among cache 7 within each cell Cj.
As a coherency protocol, a “Directory-based Cache Coherency Protocol” and “Snooping Cache Coherency Protocol” are known.
According to the directory-based cache coherency protocol, a table (directory) is installed to manage the status of caching regarding the data stored in the main memory. When a certain CPU accesses data where the CPU is present, the information regarding the caching status stored in the directory is initially investigated. If the desired latest data are determined to be present in only a certain cache instead of in the main memory, the CPU accesses its data for that cache.
According to the snooping cache coherency protocol, when a certain CPU accesses data of a certain access address, all caches examine whether they own copies of the data of that access address. Each cache changes their own status (cache status) to enable acquisition of the latest data, if necessary. For example, according to MESI protocol, the following four cache statuses are provided: “I: invalid”, “S: shared (the latest data are present in the main memory and in multiple caches), “E: Exclusive (the latest data are present in the main memory and its own cache)”, and “M: modified (the latest data are present only in its own cache”.
In each of the cells C1 to Cn shown in FIG. 1, in order to implement the aforementioned cache coherency, cache coherency circuits 3-1 to 3-n and directories 4-1 to 4-n are installed. A directory 4-j is a memory area implemented by DRAM and the like, and it stores information for managing the caching status regarding the data stored in the main memory 5-j of its own cell Cj. More specifically, the directory 4-j stores a table showing the cell caching the latest data regarding the data stored in the main memory 5-j of its own cell Cj. The cell caching the latest data can be its own cell or an other cell. The cache coherency circuit 3 is configured such that the coherency among the cells C1 to Cn is assured by referring to the directory 4. The cache coherency in each cell can be implemented, for example, by the snooping cache coherency protocol.
A cell loading the CPU 2 which is an issuing source for issuing a read request for certain data are hereinafter referred to as “a request cell CR”. A cell loading the main memory 5 wherein the data to be accessed is stored is hereinafter referred to as “a home cell CH”. Also, a cell having a cache 7 caching the latest version (latest data) of the data to be accessed is hereinafter referred to as “an owner cell CO”. The inventors of the present application focused at the fact that in the multiprocessor system 1 using the directory 4, a status occurs with a different request cell CR, home cell CH and owner cell CO.
Operational examples of the conventional multiprocessor system in such a state will be explained with reference to FIG. 2. Initially, the CPU 2 of the request cell CR issues a read request to the home cell CH having the main memory 5 storing the data to be accessed (Step S1). The home cell CH searches (retrieves) its own directory 4 in response to the read request. From the information stored in the directory 4, the home cell CH detects the fact that the latest version of the target data is stored in the cache 7 of the owner cell CO. Then, the home cell CH issues a snoop request to the owner cell CO (Step S2).
In response to the snoop request, the owner cell CO reads out the latest data from the cache 7. In order to rewrite the latest data in the main memory 5 of the home cell CH, the owner cell CO issues a write back to the home cell CH (Step S3). The write back issued by the owner cell CO in response to the snoop request is hereinafter referred to as “reply write back”. Simultaneously with this reply write back, the owner cell CO directly transmits the latest data read as a reply data to the request cell CR (Step S3′). As a result, the latency from the issuing of the read request to the reception of the reply data can be implemented by 3HOP (Steps S1, S2, S3′) (The latency between the cells is considered as 1 HOP).
The CPU 2 of the request cell CR issuing the read request stores the reply data received in its own cache 7. Subsequently, in the case when the reply data of the cache 7 is replaced, there is a possibility of the generation of a write back from the request cell CR to the home cell CH (Step S3″). The write back issued when the CPU 2 replaces the cache 7 is hereinafter referred to as a “request write back”. Since the aforementioned reply write back (Step S3) and the request write back (Step S3″) are performed by a different path, the order is not always determined unconditionally. If Step S3 is carried out “later” than Step S3″, the latest data updated by the request cell CR in the main memory 5 of the home cell CH is overwritten by the old data from the owner cell CO. That is, if Step S3 is different from Step S3″, there is a problem that the coherency is not always assured.
In order to solve this problem, an operational example will be explained with reference to FIG. 3. Initially, the request cell CR issues a read request to the home cell CH (Step S1). Next, the home cell CH issues a snoop request to the owner cell CO (Step S2). In response to the snoop request, the owner cell CO reads the latest data from the cache 7 and issues a reply write back to the home cell CH (Step S3). Here, the owner cell CO does not directly transmit the latest data to the request cell CR. Next, the home cell CH updates its own main memory by the latest data in response to the reply write back. Later, the home cell CH transmits the latest data as a reply data to the request cell CR (Step S4).
By this processing, the occurrence of the relationship between the Step S3 and Step S3″ as shown in FIG. 2 is prevented so that it is possible to assure coherency between cells. However, in the case of the processing shown in FIG. 3, the latency from issuing of the read request to the reception of the reply data becomes 4 HOP (Steps S1, S2, S3, S4). This implies a reduction in the processing speed. Basically, multiple processors are used in the multiprocessor system in order to improve processing speed so that the aforementioned reduction in processing speed is a serious problem.