1. Field of the Invention
The present invention relates to a shared memory type multi-processor system in which a plurality of processors are connected and a shared memory space shared by the processors is arranged, and more specifically to a system comprising processors having a shared memory cache for caching data in the shared memory space. Processing of software is performed by each individual processor, and the shared memory is used as the space for transferring data when the processing is handed over between processors and for storing information which should be managed not by standalone processors but by a system. The shared memory cache is introduced to improve the performance of the system by speeding up access to the shared memory.
2. Description of the Related Art
FIG. 1 shows a conventional example of the simplest shared memory type multi-processor system.
A plurality of processors and shared memory are connected by the same global bus, and each processor accesses the shared memory via this global bus. Each processor (1a-1) to (1a-n) transmits a bus request signal (1c-1) to (1c-n) to an arbiter (1b), the right to use the global bus is arbitrated by the arbiter, the right to use the global bus (1e) is only given to one processor at a time, and a bus permission signal (1d-1) to (1d-n) is transmitted to that processor. The processor which has received the bus permission signal accesses the shared memory (1f) via the global bus and receives the desired data.
In the implementation shown in FIG. 1, all access to the shared memory regardless of the kind of read or write, is performed via the global bus. Here, there are two restrictions.
Restriction 1: It takes time to transmit the signals. (Physical restriction)
Restriction 2: It takes time to wait and acquire the right to use the global bus. (Theoretical restriction)
The former is ascribed to the fact that it is difficult to transmit the signals at high speed because of the electrical conditions when the signal transmission distance in the global bus becomes long, and a plurality of processors share the same signal line. The latter is ascribed to the fact that when two or more processors access the shared memory at the same time, the time which is required to wait until the second and subsequent processors access the shared memory owing to the arbitration of the right to use the global bus arises. As a result, these restrictions give rise to the following problems to the access to the shared memory space.
Problem 1: Shortage of an area (the number of times of access per unit time which is permissible to the system)
Problem 2: Excess of latency (time required from the start of access to the end of access)
FIG. 2 shows a conventional example in which a shared memory cache (2h) is arranged in each processor.
When a processor core (2g) reads the shared memory space, and if there is a copy of the data of the shared memory space in the shared memory cache, the read processing can be completed by the processor via an internal bus (2i), and Restriction 1 can be reduced thereby. Since the access to the shared memory space is not performed via the global bus, the arbitration of the right to use the global bus is not required, so that the processor (2a) is released from Restriction 2. In this respect, the introduction of the shared memory cache can be a measure for solving the two problems described above.
Each processor can hold an individual copy of the data of the shared memory space by introducing the shared memory cache, but the data in the shared memory space must look same to all the processors. Consequently, for write processing which is the opportunity for updating the data, it is absolutely necessary to consider the control of coherency which ensures this. This control of coherency is also an obstacle for solving the above-mentioned problems, the reasons for which will be described later.
Here, the requirements for the control of coherency are divided into three as follows.
Requirement 1: Synchronization in terms of time
Requirement 2: Synchronization in terms of space
Requirement 3: Reduction of update time
FIG. 3 shows the control of coherency. FIG. 3 explains the meaning of said requirements, and it is assumed therein that when data of an address on the shared memory space is value 0, processor 1 writes value 1 to said address, and after then, processor 2 writes value 2, and the other processors 3 to n read said address. Here, Requirement 1 corresponds to, for example, excluding the possibility of reading the values in the order from 2 to 1 (ensuring t1≧0), and Requirement 2 corresponds to, for example, excluding the possibility that although there is a processor which has already read value 1, another processor which reads value 0 later is generated (ensuring t2≧0). Requirement 3 corresponds to shortening both the time required from the time when data is updated to the time when the other processors are still reading the data before updating and the time required from the time when data is updated to the time when the other processors can read the data after updating as much as possible (minimization of t2 and t3). Requirement 3 is not an indispensable requirement for the control of coherency, but is required to improve the performance of the system.
Given as an example of the control of coherency shown in FIG. 3 is a method in which every time a processor performs write a process to the shared memory space, the processor reflects the write process to its own shared memory cache and writes to the shared memory via the global bus at the same time, and the other processors monitor write access appearing on the global bus and when data of said address is in each shared memory cache, the other processors replace the data by the data on the global bus.
FIG. 4 shows an example of the method of establishing cache coherency. FIG. 4 is an example of the processing sequence based on the above-mentioned method. The timing of (4a) to (4f) shown in the figure corresponds to the following phenomenon.    (4a): The processor starts write access.    (4b): When write access is started, the processor transmits a global bus request.    (4c): The processor receives bus use permission, and outputs address data to the global bus.    (4d): The other processors and the shared memory receive information from the global bus, and write it to their shared memory or their shared memory cache.    (4e): Write to the memory is completed.    (4f): The processor which started the write access releases the bus.
In this example, conditions necessary to ensure coherency are indicated by the following expressions.trc(min)>tdsd(max)+tdmw(max)  (1)tdsd(max)<tdsd(min)+tdmw(min)  (2)Here,    trc: Time required from the issue of a write to the global bus to the release of the bus    tdsd: Time required for the other processors to recognize the issue of a write to the global bus    tdmw: Time required for the processor and the shared memory to recognize a write access on the global bus and reflect the data to themselves
Here, expression (1) is a condition for satisfying Requirement 1, and guarantees that the processor releases the global bus after the write value is reflected to the shared memory cache on the shared memory and all the processors. (Generally, a sequence in which the response of write completion is transmitted from the side of the shared memory into which data writing is performed and the bus is released when the processor receives the response is commonly employed.) When the next processor begins write processing according to the arbitration of the right to use the global bus by satisfying said condition, it is guaranteed that the previous write processing has been completed. That is it is as if the requirements for the control of coherency were satisfied by a disadvantage of the global bus, but in fact, there is no essential difference from the fact that Requirement 1 requires arbitration for updating data. This is because guaranteeing the order of updating data is equivalent to guaranteeing that a plurality of data updates do not occur at the same time, namely, performing arbitration. Therefore, to satisfy Requirement 1 for the control of coherency means that Requirement 2 which arises in using the global bus is imposed in the same way, thereby causing an obstacle for solving the problems.
Expression (2) is a condition for satisfying Requirement 2 by absorbing the variation in the timing of (4d) shown in FIG. 4. The timing of (4d) is the boundary between whether data before updating is returned to the processor core or data after updating is returned to the processor core, when read access contending with write access arising on the global bus is started on each processor. Since the timing at which data, after being updated, is returned is the timing of (4e), and if expression (2) is not satisfied, this timing is reversed according to a processor, which is contrary to Requirement 2.
Here, expression (1) indicates that the bus occupation time must be made more than a specific time, or that a restriction imposed on the bandwidth of the shared memory space, and expression (2) indicates that the time for writing data to the shared memory cache and the shared memory must be kept above a specific time considering that the timing of (4d) fluctuates among processors even if an effort is made to shorten the time for writing data into the shared memory cache and the shared memory and to increase the bandwidth. As seen from these examples, since conditions are attached to the timing of various operations, the control of coherency creates a sort of restriction in itself when an effort is made to shorten the processing time and to improve the performance of the system.
Patent Document 1 is available as a conventional technology for securing coherency among cache memories. In Patent Document 1, a processor module has a cache memory, and issues a coherency transaction to other processor modules via a bus. The processor modules which have received the coherency transaction perform an examination of coherency. When data update is implemented to maintain the coherency, the data to be used for data update is transmitted via the bus. A signal line connecting the processor modules and the main memory is used to provide notification of the results of the examination of coherency.
Patent Document 1: Kokai (unexamined patent publication) No. 7-281956