1. Field of the Invention
The present invention relates to a multiprocessor system having a plurality of processors, a main memory and a plurality of cache memories.
2. Description of the Prior Art
The cache memory is discussed in detail on pp. 473 to 530 of "ACM, Computing Surveys", Vol. 14, No. 3.
The cache memory is one having a smaller capacity but capable of performing higher-speed accesses (i.e., read/write) than the main memory and is placed close to the processors with a view to speeding up their accesses.
Since the data transfer time period between the main memory and a processor is several times as long as that of the internal processing time period of the processor, the speed-up of the processing cannot be achieved if data is read out from the main memory upon each execution of an instruction. Also, the processor has the main memory in its entirety as its access target and will have its access concentrated in a restricted portion of the main memory for a very short time period. If, therefore, the data of the main memory is partially copied in the cache memory so that most of the accesses of the processor may be executed with the cache memory, the access time can be shortened on an average.
The cache memory manages its storage in the unit of a suitable size called the "block". Each block holds a copy of the data from a certain position of the main memory. This storage is called the "data array". In addition, which portion of the main memory each block corresponds to can naturally be dynamically changed, and this address data is stored in the cache memory. This storage is called the "address array".
In order that most of the accesses of the processor may be accomplished with the cache memory, the data frequently accessed recently should be stored in the cache whereas the data not accessed recently need not be stored in the cache. The cache memory performs the controls satisfying the requests thus far described, as will be explained with reference to the writing and reading operations as an example.
When the data in the cache is accessed, the block holding the data is continuously stored. When data outside of the cache is accessed, on the contrary, one block including the data in the main memory is transferred to the cache, and one block which has not been recently used is expelled from the cache. Here, it is called a "hit" when the accessed data is in the cache, and the otherwise situation is called a "miss".
The store method of the cache is divided into the write-through method and the copy-back method, as has been specified on pp. 473 to 530 of the aforementioned reference "ACM, Computing Surveys" Vol. 14, No. 3.
According to the former write-through method, if the cache is hit in case the processor is to write, the inside cache and the outside main memory have their contents coinciding at all times by performing the writing operations in both.
According to the latter copy-back method, on the contrary, if the cache is hit in case the processor is to write, the writing operation is carried out not in the outside main memory but only in the inside cache.
Comparing these two methods, the copy-back method is so more complex in the control than the write-through method that its control requests a larger amount of hardware. This request is serious especially in the control for holding the consistency of the content of each cache in the shared memory type multiprocessor.
Since, however, the main memory is not written when the cache is hit in the copy-back method, the write of the main memory can be omitted to provide an advantage in performance over the write-through method. Another advantage is that the traffic to the main memory can be reduced by avoiding the writing time to the main memory. This reduction in the traffic to the main memory is an important point for improving the total performance of the multiprocessor system.
Thus, in a multiprocessor system, the plural processors share the main memory and also have individual cache memories. In a case where the cache memories are missed, a write request or a read request is transmitted, if made from a processor, to the main memory through the missed cache memories. Therefore, it can be considered that the missed cache memories share the main memory for writing and reading operations.
In addition to the cache memories, there exists another device which is connected to a main memory bus and enabled to write to or read from the main memory directly. This device is exemplified by a video controller or a DMA (i.e., Direct Memory Access) controller for supporting a virtual memory. These devices for the direct read or write of the main memory will be generally called the "input/output device" in the present invention.
The multiprocessor system thus far described is equipped with a plurality of devices (e.g., a plurality of processors or a plurality of input/output devices) for reading or writing the main memory. Then, it is a natural requisite for the processor system to transfer the data from one device to another. In the system including the plural cache memories, however, the stored contents of the cache memories after the data transfer are not always correct. This is because a plurality of cache memories holding a certain address exist in the system but may have an inconsistency in data thereamong.
Two examples of this inconsistency will be explained with reference to the following. In the first example, the cache does not have its stored content updated, so if it stores the copy before the write, it fails to hold the consistency of data, although the main memory will have the latest content if it is written. This case is called the "first problem" of the data inconsistency.
The second example is directed to a problem which is raised when the data is written in the cache memory from the processor by the copy-back method. As has been described hereinbefore, the method of processing the writing access in the cache is roughly divided into the copy-back method and the write-through method. According to the write-through method, the main memory is always written, and the cache is written only in case it is hit. According to the copy-back method, only the cache is written, in case it is hit, and it is written after the block is replaced when missed. Thus, according to the copy-back method, the data is written only in the cache so that the stored content of the cache itself is the latest. However, the content of the main memory and the contents of other caches are not updated to fail to keep the consistency of the data. This case is called the "second problem" of the data inconsistency.
The aforementioned first problem of data inconsistency can be solved by a method based upon the concept of invalidation. If a block containing a write address exists in another cache, it is controlled so as to be eliminated and invalidated at each time of the write of the main memory, thus eliminating the inconsistency of the data, which has been described in connection with the first problem. This control is feasible in the system sharing a single main memory bus. Specifically, when a bus master on the main memory bus uses the main memory bus for the write, all the remaining caches watch the address of the main memory so that the block containing the write address may be invalidated if it exists in the pertinent cache. This invalidation is executed by changing a valid flag (which will be called the "V bit") of 1 bit, which is assigned to each block for indicating the validity of the same, from the logic "1" to the logic "0".
Now, a variety of solutions have been conceived in the prior art for the second problem. A representative method of the solutions is disclosed in Japanese Patent Publication No. 52-14064 claiming for its priority basis U.S. patent application Ser. No. 174,824 filed on Aug. 25, 1971. In this method, the data consistency is held by using the copy-back method, as schematically shown in FIG. 10. Reference numerals 201 and 202 designate processors; numerals 203 and 204 designate cache memories; numeral 205 designates a main memory; and numeral 206 designates a main memory bus. Each block of the caches 203 and 204 has as a MC bit (i.e., Multi-Copy bit indicating the existence of a plurality of copies) the information indicating whether or not an identical copy exists in another cache.
According to the copy-back method, therefore, only the cache is written. Upon this write, the identical copy, if any in another cache, is invalidated by sending the address to a broadcast bus 207 wired between the caches. If not in another cache, the sending of the address to the broadcast bus 207 is omitted. If, moreover, each block of the cache has a U bit (i.e., an update bit indicating the update of the data only in the cache), the U bit is set to the logic "1" upon the write of only the cache according to the copy-back method. Since the block of U=1 has inconsistency in data with the main memory, its data is written back in the main memory when the block is expelled in accordance with the block replacement and has the V bit and the U bit at the logic "1".
When, moreover, another bus master transfers the data, which belongs to the block having the V bit and the U bit at "1", from the main memory, a control is made to keep the old data of the main memory away from any access because the data of the pertinent block of the main memory is not the latest. Specifically, this access is interrupted, and one block including the inconsistency is newly written in the main memory to update the content of the main memory to the latest.
It should be noted that a signal line 208 for operating the MC bit of another cache is connected between the caches so as to enable the controls thus far described.
On pages 273 to 298, "ACM Transactions on Computer System" Vol. 4, No. 4, November, 1986, on the other hand, there are disclosed a variety of methods (of cache coherence protocol) for coincidences of the stored contents among a plurality of caches for executing the write in the shared bus multiprocessor system according to the copy-back method.
One method is called the "synapse method", which has a major feature that the main memory contains in connection with each cache block a single bit tag indicating whether or not it should respond to the miss of the block. In case the cache has an updated copy of the block, this bit informs the main memory of the fact that the response is unnecessary. Moreover, the cache block in the multiprocessor system is in one of invalid, valid (i.e., non-updated so that it may possibly be shared) and updated (i.e., dirty with no other copy) states.
In the multiprocessor system according to this synapse method, it is assumed that the write access from the first processor hits the first cache relating to the processor. At this write hit, the first cache is shifted from the valid state to the updated state. In the absence of the invalidating signal, moreover, the processing of the first cache is identical to that of the write miss, as will be described in the following.
In this synapse multiprocessor system, on the other hand, it is assumed that the write access from the first processor fails to hit the first cache relating to the processor. Upon this write miss, the data of the block is transferred like the read miss from the main memory to the first cache. If, at this transfer, a valid block copy is present in another cache connected to the shared bus, the cache is invalidated. By this writing operation, moreover, the first cache is loaded with the block to take the updated (i.e., dirty) state. Moreover, the block tag in the main memory is set so that the main memory ignores the subsequent request for the pertinent block. By the method thus far described, a coincidence of the stored content of the cache in the synapse multiprocessor system can be attained.
One method is called the "Berkeley method", which is majorly featured in that a direct cache-cache transfer is used, in case a block is shared between the caches, whereas an updated block is not written back into the main memory in case the updated block is shared between the caches. This method requests not only the three invalid, valid (i.e., non-updated and possibly shared) and updated (i.e., dirty) cache block states but also one shared and updated (i.e., shared-dirty and possibly shared) state.
In this Berkeley multiprocessor system, it is assumed that the write access from the first processor hits the first cache relating to the processor. Upon this write hit, the first cache is shifted from the valid to updated states. At this time, moreover, other caches connected to the shared bus are invalidated in response to the invalidating signal.
In this Berkeley multiprocessor system, on the other hand, it is assumed that the write access from the first processor fails to hit the first cache relating to the processor. Upon this write miss, the pertinent block is transferred directly to the first cache from the main memory or another cache in the updated (or dirty) state. Upon this transfer, the third cache having the pertinent copy is invalidated. By this writing operation, moreover, the first cache is loaded with the block to take the updated (or dirty) state. By the method thus far described, a coincidence of the stored content of the cache in the Berkeley multiprocessor system can be attained.
By the methods thus far described according to the prior art, the problems of the data coincidence of the caches of the multiprocessor system can be solved. If, however, the multiprocessor system is actually constructed in accordance with the concepts of the prior art, the following additional various problems will arise, as has been clarified by us.
In the first problem, as disclosed in Japanese Patent Publication No. 52-14064, it requests a complicated hardware that each cache block stores data concerning whether or not an identical copy exists in another cache. Let the case be considered, for example, in which the first cache reads and transfers a certain block from the main memory and in which the second cache then reads and transfers it from the main memory. Since, in this case, a plurality of copies are obtained at the time of the read and transfer of the second cache, the MC bits (i.e., Multi-Copy bits) of both the first and second caches have to be set at "1". The first cache can set the MC bit at "1", if it watches the read access on the main memory bus, but the second cache cannot determine the value of the MC bit by itself so that it has to operate the MC bit in response to a data from the first cache by using some data transmission means such as the MC bit operation line 208, for example, in FIG. 2. This needs the excessive hardware. Moreover, this operation is delayed because the signal has to go forward and backward between the caches. Therefore, the cache has to write the MC bits in itself at an exceptional timing which is delayed by a predetermined time from the start of the bus cycle. However, this exceptional timing is not always equal to that for storing the data which is read out from the main memory. This makes it difficult to design a high-speed system, and, if possible, a complex logic is requested for managing the timing.
A second problem is that, in the case of the write hit of the first cache in the well-known synapse multiprocessor system, the block, which ordinarily needs no transfer, is transferred from the main memory to the first cache because of the hit. For this data transfer, the main memory has to be subjected to the read access. Since, however, the main memory having a large capacity generally takes a long access time, the data transfer is difficult to speed up.
A third problem arises in the known synapse multiprocessor system and is directed to a troublesomeness that the single bit tag indicating whether or not the main memory should respond to the miss of the block has to be not only contained in the main memory in connection with each cache block but also controlled in response to the shift of the state of the cache.
A fourth problem arises in the known Berkeley multiprocessor system and is that, since the caches have as many as four states, the hardware of a cache state control unit for controlling the states of a cache in response to the access from the processor and the address from the shared main memory bus is large-sized to have its logic complicated.
A fifth problem also arises in the known Berkeley multiprocessor system and is that, if the number of multiprocessors and the corresponding number of caches are increased, the data of the updating state of a cache being updated (or dirty) has to be transferred to other caches one by one to thereby stop the access during the transfer from the pertinent processor to the cache which is stored with the data in the updated state to restore the updated (or dirty) state. If the read access from the first processor fails to hit the first cache, for example, the pertinent block is transferred from the second cache in the updated (or dirty) state directly to the first cache. As a result, the first cache is shifted from the invalid to valid states, and the second cache is shifted from the updated to shared-dirty states. Next, in case the read access from the third processor fails to hit the third cache, the pertinent block is transferred from the second cache in the shared-dirty state directly to the third cache. As a result, the third cache is shifted from the invalid to valid states, and the second cache keeps the shared-dirty state. In case, moreover, the read access from the N-th processor fails to hit the N-th cache, similar results come out so that the access from the second processor to the second cache is continuously interrupted during those continuous direct transfers.
Another problem is an elongation of the wait time to the main memory in case another bus master (e.g., another cache or input/output device) requests the use of the bus during a block transfer relating to the data coincidence. This obstructs the soft movement of the bus ownership on the main memory. In Japanese patent Publication No. 52-14064, for example, when the write access misses the first cache, the data corresponding to the address of the write access is invalidated at first if it belongs to one block of another cache. Next, said one block is transferred (i.e., block-transferred) from the main memory to the first cache, and this first cache then writes the data of the write access on the block-transferred data. Since, on the other hand, the block-transferred one block is composed of a plurality of (e.g., four) words, its transfer from the main memory to the first cache requests a plurality of bus accesses. Therefore, the first cache writes the data of the write access on the data of the block-transferred first word. If, on the other hand, another bus master, i.e., the second cache or the input/output device tries to access the main memory during the plural accesses at the block transfer, the access from the second cache may not be accepted. This is because another bus master may have requested the block data just being block-transferred from the main memory by the first cache. However, after the transfer of the first word of the one block block-transferred to the first cache having caused the write miss, the state of the first cache has already been shifted from the invalid to updated states. Therefore, the data of the block data to be block-transferred from the main memory is not wholly prepared in the first cache having been shifted to the updated state so that the one block of complete words cannot be written back from the first cache to the main memory. In case, therefore, during the block transfer of the first cache having being write-missed, another bus master accesses the data of any block being in the updated state in the first cache, the access of another bus master to the pertinent block is always made to wait till the end of the block transfer from the main memory to the first cache. Thus, in the well-known prior art before the present invention, there is not any device that will relinquish the bus ownership of the main memory bus to another bus master during a plurality of times of main memory bus accesses of the block transfer but will not cause any increase in the aforementioned wait time to the main memory.
The present invention has been conceived on the aforementioned investigation results obtained by us and contemplates to provide a multiprocessor system capable of eliminating at least some of the aforementioned problems owned by the prior art.