A cache, main memory, or other temporarily private data storage generally implements a particular write policy or strategy. “Temporarily private data storage” refers to a component of a computer system that temporarily maintains some particular data in a private state (e.g., some portion of the computer system can see particular data while another portion of the computer system cannot see that data). Subsequently, the particular data can be made available to another portion of the computer system. A scratch pad memory of a processor is an example of temporarily private data storage.
Examples of write strategies include a write through strategy and a write back strategy. The simplest case is the write through strategy. In a write through cache, a write operation from the processor leads to the transfer of the data to a slower level in a memory hierarchy, even with a cache hit. Moreover, an entry in the write through cache may be written to and updated.
In a write back cache, on a write operation from the processor, only the entry (on a cache hit) in the write back cache is written to and updated while the content of another, slower level of memory (e.g., the next slower level of cache or the main memory) remains unaltered. A “dirty” entry refers to an entry (e.g., a line or page and its associated tag or other state information) that has been written to and updated but has not yet been updated in a slower level of memory. A dirty cache entry is subsequently copied to the main memory or to another, slower level of cache or to the main memory in order to update the content there.
Generally, dirty cache entries are copied to another, slower level of cache or the main memory after an explicit instruction to clean (or flush) the write back cache, or in certain cases of capacity, conflict, or coherence misses. Some fault-tolerant computer systems cleanse cache memories of dirty entries as part of a checkpoint process. In a checkpoint process, the state of the computer system is periodically recorded (stored) at checkpoint boundaries. In the event of a fault, the computer system can backtrack to a previous state that existed prior to the fault, thereby losing only the time invested between the most recent checkpoint boundary and the time that the fault occurred.
Accordingly, information sufficient to restore the computer system to a state equivalent to the state that existed prior to the fault is typically stored (for example, a state at which the computer system can satisfactorily restart computation without including incorrect execution, data or the like). One method of accomplishing this is to cleanse the cache memory of dirty entries at each checkpoint boundary. The dirty entries can be written back to main memory or other storage and thereby preserved.
A system conducting checkpointing will typically continuously repeat two phases, a computation phase, and a checkpoint phase. During the checkpoint phase, the checkpoint will be constructed. In the event of a system failure that can be corrected via the use of the checkpoint, the system will conduct a recovery phase and then possibly continue, perhaps in a reconfigured or degraded state, either with or without further checkpointing.
During the checkpoint phase, execution of user applications is typically not possible. The common requirement to clean the dirty cache entries requires significant computer system processing resources. This typically causes execution of user applications to stall until the checkpoint can complete. In addition, work should not be done while the previous work is being recorded in order to ensure the previous work is not commingled with any present work during the checkpoint phase. Thus, it is typically not possible during the checkpoint phase to execute user applications. Additionally, cleaning of caches typically causes a condition of memory bandwidth saturation. The condition of memory bandwidth saturation can generally be only partially alleviated by methods of preemptive cache cleaning, resulting in a longer checkpoint phase and a reduced computation-phase duty cycle.