A cache, main memory, or other temporarily private data storage generally implements a particular write policy or strategy. “Temporarily private data storage” refers to a component of a computer system that temporarily maintains some particular data in a private state (e.g., some portion of the computer system can see particular data while another portion of the computer system cannot see that data). Subsequently, the particular data can be made available to another portion of the computer system. A scratch pad memory of a processor is an example of temporarily private data storage.
Examples of write strategies include a write through strategy and a write back strategy. The simplest case is the write through strategy. In a write through cache, a write operation from the processor leads to the transfer of the data to the next level in a memory hierarchy, even with a cache hit. Moreover, an entry in the write through cache is written to and updated.
In a write back cache, on a write operation from the processor, only the entry (on a cache hit) in the write back cache is written to and updated while the content of another level of memory (e.g., the next level of memory or the main memory) remains unaltered. A “dirty” entry refers to an entry (e.g., a line or page) that has been written to and updated but has not yet been updated in another level of memory. A dirty cache entry is subsequently copied to the main memory or to another level of memory in order to update the content there.
Generally, dirty cache entries are copied to the main memory or another level of memory after an explicit instruction to clean (or flush) the write back cache, or in certain cases of capacity, conflict, or coherence misses. Some fault-tolerant computer systems cleanse cache memories of dirty lines as part of a checkpoint process. In a checkpoint process, the state of the computer system is periodically recorded (stored) at checkpoint boundaries. In the event of a fault, the computer system can backtrack to a previous state that existed prior to the fault, thereby losing only the time invested between the most recent checkpoint boundary and the time that the fault occurred.
Accordingly, information sufficient to restore the computer system to a state equivalent to the state that existed prior to the fault is typically stored (for example, a state at which the computer system can satisfactorily restart computation without including incorrect execution, data or the like). One method of accomplishing this is to cleanse the cache memory of dirty lines at each checkpoint boundary. The dirty lines can be written back to main memory and thereby preserved.
A problem in the prior art is that cache flushing at a checkpoint boundary may cause parts of the computer system to operate above an optimum or maximum threshold of utilization. For example, at the time of the checkpointing operation, the memory bus may become saturated or may operate at a capacity greater than that which is optimal. This in turn may lead to bottlenecks and excessive queuing of requested operations, thereby increasing latency and stall time of instruction execution.
Thus, what is needed is a method and/or system that can alleviate the impact of the checkpoint process on computer system resources.