1. Field of the Invention
The present invention relates to a multiprocessor including a plurality of processor units connected in common to a global bus.
2. Description of Related Art
FIG. 21 is a block diagram showing a conventional multiprocessor. In FIG. 21, the reference numerals 1 and 1A each designate a processor unit comprising a CPU 5, and a local cache memory 6 with a write through function and a write monitoring function. The local cache memory 6 of each of the processors 1 and 1A is connected to a common global bus 2 which is connected to an external memory 4 through an interface 3. Incidentally, an instruction cache is not shown, because not the instruction cache but the data cache is a subject matter here.
Next, the operation of the conventional multiprocessor will be described.
The CPU 5 exchanges data with the external memory 4 through the global bus 2 and interface 3. A low processing rate of the global bus 2 and interface 3, however, causes a bottleneck, and hinders the CPU 5 from achieving its original processing rate.
Thus, various schemes are proposed which can improve the rate by storing nearby the CPU 5 some contents of the external memory 4 that are used frequently by the CPU 5. The local cache memory 6 is placed close to the CPU 5 for that purpose.
The operation of the local cache memory 6 will now be described.
1. Read Operation of the Local Cache Memory 6.
Assume that the CPU 5 reads address 0013 of the external memory 4. The local cache memory 6 checks if it possesses the content of the address 0013. If it has, it provides the CPU 5 with the content of the address 0013. As a result, the CPU 5 can operate at its original high rate without using the low speed global bus 2 and interface 3.
Unless the local cache memory 6 possesses the content of the address 0013, it selects its storing content which will not be used by the CPU 5 for a considerable time from now on (the selection method is omitted here because it is not a subject matter of the present invention), erases the stored content (eliminates it from the cache after writing it in the external memory as will be described later), and transfer the content of the address 0013 to that space. Thus, the CPU 5 can read the content of the address 0013 quickly thereafter because the local cache memory 6 holds the content of the address 0013. This mechanism is referred to as "purge".
2. Write Operation of the Local Cache Memory 6.
There are two methods for the CPU 5 to write data to the external memory 4: A write back method and a write through method.
First, the write through method will be described. When the CPU 5 writes data to the address 0013 of the external memory 4, the local cache memory 6 checks whether it holds the content of the address 0013 as in the read operation. If the local cache memory 6 possesses the content of the address 0013, the local cache memory 6 updates the content of the address 0013 of itself and that of the external memory 4. Unless the local cache memory 6 holds the content of the address 0013, it eliminates a content which it considers that the CPU 5 will not use, and writes the content of the address 0013 into that space and the corresponding address of the external memory 4. As a result, the global bus 2 and interface 3 with the low operation rate are used at every write operation.
Second, the write back method will be described. The write back method differs from the write through method in write timings. More specifically, in the write back method, although the data is written into the local cache memory 6, it is not written into the external memory 4 at that instant. The data is written in the external memory 4 when the local cache memory 6 purges it. As a result, the low operation rate global bus 2 and interface 3 are used only in the purge, achieving an operation rate higher than that of the write through method.
3. Application of the Local Cache Memory 6 to a Multiprocessor.
When applied to a multiprocessor, the local cache memory 6 must operate in the write through mode, and have a "monitoring function" of the write content of the CPU, as well.
The reason for employing the write through mode (that is, the reason that the write back method cannot be used) is as follows. When the data is written to the address 0013 in the write back mode, it is not written into the external memory 4 until it is purged. Thus, another CPU, which tries to read the address 0013, will read the data of that address which is not yet updated before the purge.
On the other hand, even if the write through mode is applied, if another CPU has already held the content of the address 0013, that content is not updated. Accordingly, it is necessary for each of the local cache memories 6 to monitor the write operation of the other local cache memories, so that each of the local cache memories 6 invalidates the content of the write data address if it detects that content in its address information.
To maintain the identity of the data between the local cache memories or between the local cache memories and a shared memory, various methods have been proposed in the cache memory configuration of the multiprocessor. For example, Japanese patent application laid-open Nos. 2-22757/1990 and 4-175946/1992 employs a technique of invalidating data in the cache memories by dividing data into shared/unshared data and accessing different memories in response to the shared or unshared data, and by monitoring the write of the shared data in the method described above.
U.S. Pat. No. 4,939,641 discloses a method that possesses shared/unshared information in the cache memory, and carries out read and write of the cache using the write back method for the unshared data, and the write through method for the shared data. In summary, they employs a method with "write monitoring". There are countless such configurations comprising multiple processors and cache memories, and some of them assume the "write monitoring".
With the foregoing arrangements, the conventional multiprocessors have the following problems.
A first problem is a waste time due to the monitoring.
The monitoring carried out at every write operation hinders the CPU from using the local cache memory during the monitoring, resulting in the reduction in the operation rate of the CPU. For example, let us assume that a certain processing takes 1,000,000 times of read operations, one clock period per read operation, and 10,000 times of write operation, four clock periods per write operation (because the write operation is carried out in the write through mode, and hence it is assumed that the write operation is carried out through the bus), plus two clock periods for write monitoring per write operation. When the same processing is executed by five CPUs, the sum of the write operations by all the CPUs will be 5 CPUs.times.10,000=50,000 times, requiring 100,000 clock periods for the monitoring.
Since the time period required for the processing except for the monitoring is 1,000,000+10,000.times.4=1,040,000 clock periods, the total processing time is prolonged nearly 10% owing to the monitoring.
Under the same assumption, if 20,000 times of write operations are executed, the processing time except for the monitoring will be 1,080,000 clock periods, and thus the monitoring, which requires 200,000 clock periods, prolongs the total processing time by about 20%. In addition, if 10 CPUs execute 20,000 times of write operations, the monitoring requires 400,000 clock periods and prolongs the processing time nearly 40%. Thus, the monitoring time is generally proportional to the number of CPUs and cache memories and the number of write operations.
A second problem is the reduction in processing rate due to the unavailability of the write back cache.
Assuming that the foregoing processing is executed, and 50% of the write operations hits the cache memories, and these write operations each take one clock period, the processing time except for the monitoring time becomes 1,000,000.times.1 clock period+10,000.times.1/2.times.4 clock periods+10,000.times.1/2.times.1 clock period=1,025,000 clock periods, which is shortened by about 2% compared with the foregoing 1,040,000 clock periods. If the number of write operations doubles, it takes 1,050,000 clock periods which is shortened by about 3% as compared with the foregoing 1,080,000 periods. An increasing hit ratio will further reduce the write time to the write back cache. The multiprocessor, however, an use only the write through cache with a lower rate because the write back cache-impedes the CPU from reading the updated data.
A third problem relates to a cost.
When such a multiprocessor system with the write monitoring function is to be implemented in a single chip, the monitoring causes an increase in the function of the cache memory, which means that normal cache memories that are present in the library cannot be applied, or must be modified. If the revision is needed, this will increase the design period by that amount. In addition, a chip layout area will be increased by the additional function. As a result of the increase in the design time and layout area, an increase in the cost for developing and producing the chip is unavoidable.
On the other hand, when implementing the monitoring using components outside the chip will also present a problem. The write back or write through caches themselves are available at a rather low cost because they are widely employed by single processors which do not require any caches with write monitoring.
It is difficult, however, to acquire the cache memories with the "write monitoring" function at a low cost. This is because the multiprocessors are used only in special fields and offer only a small market, and therefore their components are limited in production and become expensive.