Checkpointing is the process by which the memory state of an executing computer program is captured and stored on storage media, such as a disc drive, tape drive or CDROM. The stored state is called an image of the computer program at that instant of time. The image can be reloaded into a computer and the software application restarted to execute from the point where the checkpoint was taken. This is useful as a recovery process where a software application has experienced a fault or crashed. The practice of checkpointing is sometime referred to as taking a back-up, and is a critical feature of most computer systems.
The practice of checkpointing an entire memory state is somewhat inefficient, however, as it requires a memory storage facility of equal size to the operating computer system and also captures considerable redundant information because most information between across checkpoints does not change. Because of this, incremental checkpoint approaches have been proposed, being either page-based or hash-based.
In page-based incremental checkpointing techniques, memory protection hardware and support from a native operating system is required in order to track changed memory pages. The software application memory is divided into logical pages, and using support from the operating system, the checkpointing mechanism marks all changed pages as ‘dirty’. At the time of taking a checkpoint, only the pages that have been marked dirty are stored in the checkpoint file. Of course, at the first checkpoint the full memory status is saved because its entirety is required as a baseline. At the time of a re-start, all of the incremental files and the first full checkpoint file are needed to construct a useable checkpoint file.
Hash-based incremental checkpointing uses a hash-function to compare and identify changed portions (called ‘blocks’) of memory and only saves those in a checkpoint file. Thus the application memory is divided into fixed sized blocks (which may be independent of an operating system page size). A hash-function H( ) maps a block B into a unique value H(B), being the H-value of the block. After taking a checkpoint, the hash of each memory block is computed and stored in a Hash table. At the time of taking the next checkpoint, the hash of each of the blocks is re-computed and compared against the previous hashes. If the two hashes differ, then the block is declared changed and it will be stored in the checkpoint file.
U.S. Pat. No. 6,513,050 (Williams et al), issued on Jan. 28, 2003, teaches an example of hash-based incremental checkpointing based on the use of a cyclic redundancy check. A checkpoint which describes a base file is produced by firstly dividing the base file into a series of segments. For each segment, a segment description is generated which comprises a lossless signature and lossey samples each describing the segment at a different level of resolution. A segments description structure is created from the generated segment descriptions as the checkpoint. The segments description structure is created by selecting a description that adequately distinguishes the segment from the lower level of resolution.
Both the page-based and hash-based incremental checkpointing techniques still save far more data than may actually be required. This is problematic, particularly as computer systems become larger and more complex since the checkpointing storage memory requirements increase, which is clearly undesirable.