This application is based on Japanese Patent Application No. 8-238,157, filed Sep. 9, 1996, the content of which is incorporated herein by reference.
The present invention relates to a cache flush apparatus for a cache memory having a snoop mechanism for maintaining data coherency and a fault tolerant computer system having the cache flush apparatus.
In general, a modern high-speed processor has a cache memory for temporarily holding data required by the processor in order to reduce the effective memory access latency. A cache memory holds data and the memory address at which the data is read out in a fixed size data storage unit called a cache block.
In a computer (more specifically a multi-processor computer) having a plurality of processors each of which has its own cache memory, a snoop mechanism is usually used to maintain data coherency among the cache memories. A snoop mechanism monitors the system bus to detect bus commands and if a bus command which requires some action of the cache memory is detected, the snoop mechanism does the required action such as replying to the bus command with data held in one of its cache blocks, discarding data in one of its cache blocks and the like.
The cache memory is classified into copy-back type and write-through type. While a write-through type cache memory writes back data into the main memory immediately when it is updated within the cache memory, a copy-back type cache memory postpones the write-back of data until it becomes necessary. Therefore, for a copy-back type cache memory, a cache block may hold updated data which has not been written back into the main memory yet. Such a cache block is called a dirty block and the state of such a cache block is called dirty.
A copy-back type cache memory requires a cache flush operation which writes back all the data updated solely within the cache memory into the main memory.
For example, data transfer between an input/output device without a memory coherency mechanism and the main memory requires a cache flush before the data transfer in order to assure that the main memory holds valid data. From now on, a xe2x80x9ccache memoryxe2x80x9d means a copy-back type cache memory.
A cache flush is also necessary for a main memory based checkpoint/rollback type fault tolerant computer. Such a fault tolerant computer periodically creates a checkpoint within its main memory. When the computer detects some faults during the normal data processing, the computer rolls back its internal state to the most recent checkpoint and then restarts the normal data processing.
Since a checkpoint image created within the main memory should contain sufficient information to restart the normal data processing, it is necessary to perform a cache flush as a part of checkpoint creation.
A cache flush is usually performed by using machine instructions provided by the processor.
Intel Pentium(trademark) processor and its successors, for example, provide a xe2x80x9cwbinvdxe2x80x9d (writeback and invalidate) instruction. The instruction writes back data of all the dirty blocks into the main memory and sets the state of every cache block invalid. Therefore, when a cache flush has been performed by executing the xe2x80x9cwbinvdxe2x80x9d instruction as a part of checkpoint creation, cache misses occur very frequently during the normal data processing which follows the checkpoint creation.
MIPS R4000(trademark) processor, for example, provides a xe2x80x9csecondary cache hit writebackxe2x80x9d instruction. The operand of the instruction, different from the case of the xe2x80x9cwbinvdxe2x80x9d instruction, is a single cache block. The following is a sample program of a cache flush operation.
fcflush
With the foregoing program, xe2x80x9ccache 0x1b, 0($4)xe2x80x9d is the xe2x80x9csecondary cache hit writebackxe2x80x9d instruction. The instruction checks the state of the cache block of a secondary cache memory designated by the contents of the fourth register ($4). If the state is dirty, the data of the cache block is written-back into the main memory and the state of the cache block turns xe2x80x9cclean-exclusivexe2x80x9d or xe2x80x9csharedxe2x80x9d. The loop should be repeated as many as the number of the cache blocks of the secondary cache memory even if the number of dirty blocks is small. It should be mentioned that the execution time of a cache instruction is usually much longer than that of an ordinary instruction such as an arithmetic instruction.
SPARC V9(trademark) processor does not provide any instruction for a cache flush. If a cache flush is necessary, a load instruction should be used so that the data within a dirty block is replaced by the data newly loaded into the cache block. Therefore, the inefficiency of a cache flush is apparently more critical than the case of Intel Pentium processor.
To accelerate a cache flush operation has been a concern for designers of a main memory based checkpointrol/back type fault-tolerant computer. Japanese patent disclosure (KOKAI) No. 5-6308, xe2x80x9cCache controller, fault tolerant computer and data transfer methodxe2x80x9d, Mitsubishi Denki Co. Ltd., proposed a cache controller with additional memory for storing the memory address at which a piece of data is updated. In a cache flush operation, while a conventional cache controller checks the state of every cache block and, if it is dirty, writes back the data held in the cache block, the proposed cache controller can use the addresses stored in the additional memory effectively. However, this method has a critical disadvantage that it requires a major modification of the present cache controller design, which is too costly.
In view of the foregoing, the primary object of the present invention is to provide a cache flush apparatus applicable to a variety of commercial high speed microprocessors having cache memories. The cache flush apparatus, attached to the system bus, maintains the memory addresses held in dirty blocks within its own storage during the normal data processing. When a cache flush is required, the cache flush apparatus reads the memory addresses from the storage efficiently and issues bus commands each of which requires to write back data held in one of the dirty blocks. As a result, a dirty block becomes xe2x80x9csharedxe2x80x9d and it still holds the same data. For a non-dirty cache block, it remains unchanged.
According to a first aspect of the present invention, there is provided a cache flush apparatus for use in a computer having at least one processor provided with a copy-back type cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the at least one processor and the main memory, the cache flush apparatus comprising:
update address registering means for monitoring the system bus to detect an update of data within the cache memory, selecting a region of the update address storage means according to a memory address MU at which the data has been updated and an identifier of a processor which has updated the data and storing the memory address MU as an update address in one of the entries of the selected region;
update address removing means for monitoring the system bus to detect a write-back of data from a dirty block, selecting a region of the update address storage means according to a memory address MW at which the data has been written back and an identifier of a processor from which the data has been written back, and removing the update address which is equal to the memory address MW and is stored in an entry of the selected region; and
flush executing means, in response to a request from the at least one processor, for issuing bus commands to the system bus each of which has one of the update addresses UA stored in the update address storage means and causes a write-back of data from the dirty block designated by the update address UA.
According to the first aspect of the present invention, the address of data held in a dirty block is stored in an entry of the selected region of the update address storage means. When a processor updates some data and as a result the state of a cache block turns dirty, the memory address MU is captured and stored in an empty entry of the selected region of the update address storage means. When a write-back of data from a dirty block occurs (which means the state of the cache block turns xe2x80x9csharedxe2x80x9d), the update address is removed from the update address storage means. The reason why the update address storage means is divided into regions and the update address registering means and the update address removing means select a region is that the removal operation can be accelerated by reducing the number of entries which must be examined.
The flush executing means, reading out all the update addresses from the update address storage means, issues bus commands to the system bus each of which has one of the update addresses UA and requires write-back of data from the dirty block designated by the update address UA.
As a result, the cache flush apparatus causes write-back of all the data from the dirty blocks and changes the state of the dirty blocks to some state other than invalid (typically xe2x80x9csharedxe2x80x9d) and does not affect the other cache blocks.
According to a second aspect of the present invention, there is provided a cache flush apparatus for use in a computer having at least one processor provided with a copy-back type and direct map cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the at least one processor and the main memory, the cache flush apparatus comprising:
update address storage means having a plurality of regions each of which corresponds to a cache block and has one entry for storing the memory address of data held in the corresponding cache block if the cache block""s state is dirty;
update address registering means for monitoring the system bus to detect an update of data within the cache memory, selecting a region of the update address storage means according to a memory address MU at which the data has been updated and an identifier of a processor which has updated the data and storing the memory address MU as an update address in the entry of the selected region;
update address removing means for monitoring the system bus to detect a write-back of data from a dirty block, selecting a region of the update address storage means according to a memory address MW at which the data has been written back and an Washin identifier of a processor from which the data has been written back, comparing the update address stored in the entry of the selected region with the memory address MW, and if they are the same, removing the update address from the entry; and
flush executing means, in response to a request from the at least one processor, for issuing bus commands to the system bus each of which has the update address UA stored in the entry of a region and causes a write-back of data from the dirty block designated by the update address UA.
According to the second aspect of the present invention, the cache flush apparatus can, in an efficient way, cope with a case where each processor has a write buffer. A write buffer is a hardware module equipped between a processor (including cache memory) and a system bus and holds several bus commands for write requests when the system bus is busy.
In this case, it appears to the cache flush apparatus as if a cache block could hold more than one dirty pieces of data.
To cope with this case, each region, assigned to one of the cache blocks, consists of an entry for storing the memory address of the latest data update relating to the cache block. When a write-back of data is detected by the update address removing means, the update address stored in the entry of the selected region and the memory address of the write-back are compared. If they are the same, the update address is removed from the entry.
According to a third aspect of the present invention, there is provided a cache flush apparatus for use in a computer having at least one processor provided with a copy-back type and direct map cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the at least one processor and the main memory, the cache flush apparatus comprising
update address storage means having a plurality of regions each of which corresponds to a cache block and has one counter and one entry for storing the memory address of data held in the corresponding cache block if the cache block""s state is dirty;
update address registering means for monitoring the system bus to detect an update of data within the cache memory, selecting a region of the update address storage means according to a memory address MU at which the data has been updated and an identifier of a processor which has updated the data, storing the memory address MU as an update address in the entry of the selected region, and incrementing the counter of the selected region;
update address removing means for monitoring the system bus to detect a write-back of data from a dirty block, selecting a region of the update address storage means according to a memory address MW at which the data has been written back and an identifier of a processor from which the data has been written back, and decrementing the counter of the selected region; and
flush executing means, in response to a request from the at least one processor, for issuing bus commands to the system bus each of which has the update address UA stored in the entry of a region with a non-initial counter value and causes a write-back of data from the dirty block designated by the update address UA.
According to the third aspect of the present invention, the cache flush apparatus can, like the second aspect, cope with a case where a processor has a write buffer.
To cope with this case, each region, corresponding to one of the cache blocks, consists of one counter and one entry. The update address registering means increments the counter when an update address is stored in the entry of the corresponding region. The update address removing means decrements the counter when a write-back of data from the corresponding cache block is detected. Therefore a counter holds the number of updated pieces of data which exist within the write buffer and the cache memory and correspond to the cache block. The flush execution means determines that an entry has a valid update address if the associated counter has a non-initial value.
For most computer systems, xe2x80x9cread-linexe2x80x9d is a bus command which causes a write-back of data from a dirty block. However, there are computer systems where no bus command is provided which requests a write-back of data into the main memory.
In this case, it is effective to provide a data path which enables the processor to read out the update addresses from the update address storage means. Then the processor, reading out each update address, executes a cache block write-back instruction for a dirty block or a load instruction which requests a data replacement from a dirty block.
According to a fourth aspect of the present invention, there is provided a cache flush apparatus for use in a computer having at least one processor provided with a copy-back type cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the at least one processor and the main memory, the cache flush apparatus comprising:
update address storage means having a plurality of regions each of which has one or more entries for storing the addresses of data held in dirty blocks;
update address registering means for monitoring the system bus to detect an update of data within the cache memory, selecting a region of the update address storage means according to a memory address MU at which the data has been updated and an identifier of a processor which has updated the data, selecting an entry of the selected region, if the selected entry has an update address UA, issuing a bus command which cause a write back of data from the cache block designated by the update address UA, and storing the memory address MU as an update address in the selected entry; and
flush executing means, in response to a request from the at least one processor, for issuing bus commands to the system bus each of which has one of the update addresses UA stored in the update address storage means and causes a write-back of data from the dirty block designated by the update address UA.
According to the fourth aspect of the present invention, the cache flush apparatus, with reasonable amount of hardware, can cope with a case where a processor has a non-direct-map cache memory as well.
Comparing the fourth aspect with the first aspect, the update address registering means of the fourth aspect is more complicated but the update address removing means is omitted.
Before explaining the fourth aspect, suppose a case where a cache flush apparatus of the first aspect is applied to a processor having a four-way set associative cache memory.
Since a region is selected according to the memory address and the processor identifier and a piece of updated data exists in one of the four cache blocks from the same way, a region must consists of at least four entries.
When a write-back of data is detected, the update address removing means has to examine the four entries to find out the entry having the specified update address. If the processor has a write buffer, the cache flush apparatus would be much more complicated. Thus the cache flush apparatus of the first aspect would be as complicated as the cache controller in case of a non-direct-map cache memory.
According to the fourth aspect, by augmenting the function of the update address registering means, the update address removing means of the first aspect can be omitted.
Since a cache flush apparatus of the fourth aspect does not have an update address removing means, when a dirty block writes back its data, the update address is not removed from the update address storage means.
When an update of data is detected by the update address registering means, it selects a region according to the memory address MU and the processor identifier. And then it selects an entry of the region. If the selected entry holds an update address UA, the update address registering means issues a bus command which causes a write-back of data stored in the dirty block designated by the update address UA. If the designated dirty block exists, the dirty block writes back its data. Otherwise the bus command is ignored. Then, the update address registering means stores the memory address MU as an update address in the entry.
Thus, the update address registering means of the fourth aspect, different from that of the first aspect, changes the state of a cache block by issuing a bus command in order to assure that when an update address is removed from the update address storage means, the corresponding dirty block (if any) writes back its data at the same time.
In addition to the advantage of easier adaptation to a non-direct-map cache memory, there are other advantages about mapping the regions of the update address storage means to the cache memory configuration.
It is possible to organize the update address storage means as a single region which would omit the region selection mechanism of the update address registering means.
It is also possible to reduce the number of entries of a region. For example, it is reasonable that each region of the update address storage means applied to a four-way set associative cache memory has only two entries. This is because it does not happen frequently that over one half of the cache blocks belonging to the same way hold dirty data at the same time.
It should be noted, however, an oversimplified hardware implementation of the update address storage means would bring a critical performance degradation because of increasing the number of bus commands issued by the update address storage means.
It is also possible that a region of the update address storage means is selected only according to the memory address MU at which the data has been updated (the processor identifier is not used).
According to a fifth aspect of the present invention, there is provided a cache flush apparatus for use in a computer having at least one processor provided with a copy-back type cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the at least one processor and the main memory, the cache flush apparatus comprising:
update address storage means having a plurality of regions each of which has one or more entries for storing the addresses of data held in dirty blocks;
update address registering means for monitoring the system bus to detect an update of data within the cache memory, selecting a region of the update address storage means according to a memory address, MU at which the data has been updated and an identifier of a processor which has updated the data, selecting an entry of the selected region, if the selected entry has an update address UA, issuing a bus command which cause a write back of data from the cache block designated by the update address UA, and storing the memory address MU as an update address in the selected entry;
update address removing means for monitoring the system bus to detect a write-back of data from a dirty block, selecting a region of the update address storage means according to a memory address MW at which the data has been written back and an identifier of a processor from which the data has been written back, trying to remove the update address which is equal to the memory address MW from an entry of the selected region for a predetermined time; and
flush executing means, in response to a request from the at least one processor, for issuing bus commands to the system bus each of which has one of the update addresses stored in the update address storage means and causes a write-back of data from the dirty block designated by the update address.
According to the fifth aspect of the present invention, the update address removing means selects a region and tries to find out an entry of the region having the update address which is equal to the memory address MW for a predetermined period. If it is successful, the cache flush apparatus behaves like that of the first aspect. Otherwise, the update address would remain in the update address storage means and the update address registering means will work well like that of the fourth aspect.
To compare the fifth aspect and the fourth aspect, the fifth aspect has an advantage that the number of bus commands issued by the update address registering means is reduced on behalf of the update address removing means.
It is preferable that the update address removing means continues searching of the selected region until a next bus command to be processed is detected, not until a predetermined time period passes.
Other than data write-back bus commands, there is another kind of bus commands which show there is no dirty block having a certain memory address. For example, suppose a cache memory issues a xe2x80x9cread-linexe2x80x9d bus command and there is no response from the other cache memories. The request/response pair indicates that there is no dirty block having the memory address included in the xe2x80x9cread-linexe2x80x9d bus command. Thus, the update address removing means of the fourth and the fifth aspect, detecting such request/response pair, can do the same operation as in the case of a data write-back bus command.
According to a sixth aspect of the present invention, there is provided a cache flush apparatus according to the fourth aspect, in which
each region of the update address storage means has a dirty block counter for counting the number of dirty blocks corresponding to the region; and
the update address registering means has means for incrementing the dirty block counter of the selected region when an update of data within a cache memory is detected, and characterized by further comprising:
decrement means for monitoring the system bus to detect a write-back of data from a dirty block, selecting a region of the update address storage means according to the memory address MW at which the data has been written back and the processor identifier from which the date has been written back, and decrementing the dirty block counter of the selected region; and
entry reclaiming means, when the value of a dirty block counter is decremented to the initial value, for making all the entries of the selected region empty.
According to the sixth aspect of the present invention, the dirty block counter of each region is properly maintained so that it indicates the exact number of dirty blocks corresponding to the region. Thus, when the dirty block counter of a region is decremented to the initial value, there is no dirty block corresponding to the region and so the update addresses can be removed from all the entries of the region (if any).
For the sake of the dirty block counter and entry reclaiming means, the number of bus commands issued by the update address registering means is reduced.
According to a seventh aspect of the present invention, there is provided a cache flush apparatus according to the fifth aspect, in which
each region of the update address storage means includes a dirty block counter for counting the number of dirty blocks corresponding to the region;
the update address registering means has means for incrementing the dirty block counter of the selected regions;
the update address removing means has means for decrementing the dirty block counter of the selected region, and characterized by further comprising:
entry reclaiming means, when the value of a dirty block counter is decremented to the initial value, for making all the entries of the selected region empty.
According to the seventh aspect of the present invention, the dirty block counter of each region is properly maintained as in the sixth aspect case.
According to an eighth aspect of the present invention, there is provided a fault-tolerant computer system having processors provided with a copy-back type cache memory having a bus snoop mechanism, a main memory and a system bus for connecting the processors and the main memory and arranged to periodically create a checkpoint within the main memory, comprising:
a cache flush apparatus according to either of the first to the seventh aspects;
normal data processing step where a processor performs normal data processing with the update address registering means of the cache flush apparatus running;
checkpoint creating step where the processor creates a checkpoint in the main memory by writing the context of the processor and activating the flush executing means; and.
rollback and recovery means where, when a fault occurs and the computer can not continue normal data processing step or checkpoint creating step any more, the processor rolls back the main memory to the state of the most recent checkpoint and restarts the normal data processing step.
A main memory based checkpoint/rollback type fault tolerant computer periodically creates a checkpoint in its main memory. The flush executing means of the cache flush apparatus accelerates a cache flush and advantageously the state of the cache blocks remains valid. Therefore a main memory based checkpoint/rollback type fault tolerant computer with the cache flush apparatus thereof increases the system performance considerably.
In addition to the above computer system for periodically acquiring checkpoints, the cache flush apparatus according to the present invention can be applied to a duplex computer system in which the primary computer alternately continues normal data processing and sends its checkpoint image to the secondary computer and when a fault occurs within the primary computer, the secondary computers takes over the normal data processing. The cache flush means of the cache flush apparatus accelerates the checkpoint creation of the primary computer.
Additional objects and advantages of the present invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present invention.
The objects and advantages of the present invention may be realized and identified by means of the instrumentalities and combinations particularly pointed out in the appended claims.