In order to provide fault tolerance for critical applications, computing systems often employ some form of checkpointing and rollback mechanism. Checkpointing and rollback enables the state of an application to be saved such that it can be restored back to the last known good state in the event of a failure. Space systems are one domain where radiation concerns can lead to high fault rates, especially when using commercial off the shelf (COTS) components.
One major limitation of checkpointing and rollback schemes is the overhead involved in logging memory transactions such that the system can be restored to a precise, known-good state. The exact state of main memory and secondary storage must be logged at each checkpoint, or all transactions must be journaled such that the state can be precisely restored in the event of a rollback. For embedded systems with limited memory and storage resources, traditional checkpointing schemes are outright prohibitive and have not traditionally been used.