Existing single core processor chips use a variety of techniques and algorithms to implement fault recovery. Exemplary techniques are disclosed in U.S. Pat. No. 5,504,859 and U.S. Pat. No. 5,692,121. In such systems, a master copy of all the processor's architected facilities is maintained in a recovery unit. The contents of these facilities is referred to as the processor's “checkpointed state”. The modifications which result from the execution of an instruction are allowed to trickle down and update the checkpointed state only after that instruction completes without error. On detection of a recoverable error, the processor executes the following steps:                1) Preserve the checkpointed state by immediately blocking all updates to it.        2) Release all stores, and perform all writes, which have been queued up by previously checkpointed instructions.        3) Re-initialize the protected arrays back to their starting state (using the array built in self test, or ABIST, engines).        4) Refresh all copies of the architected facilities with the contents of the checkpointed state.        5) Begin execution at the point before the failure was encountered.        6) Make sure the processor achieves forward progress in the execution of the instruction stream (i.e. make sure it does not keep encountering the same, or some other, error before any progress is made).        
There are some errors which a given core cannot recover from. When such an error is encountered the processor must stop running. The checkpointed state of the stopped processor is often loaded into a spare processor, when available, where execution may be able to continue uninterrupted (from an end user's perspective). This action is referred to as a processor checkstop followed by dynamic central processor (CP) sparing which is disclosed in U.S. Pat. No. 6,189,112 and U.S. Pat. No. 6,115,829. Upon detection of a non-recoverable, or checkstop, error the processor will:                1) Try and preserve the checkpointed state by immediately blocking all updates.        2) Notify the system that the chip must stop running by driving the any_check line high to the clock chip.        3) The clock chip will eventually stop the clocks to the checkstopped processor.        
The problem with this design is that any core going through recovery or checkstop takes on a certain amount of risk. For the recovery case, a core has the risk of causing Instruction Processing Damage (IPD). An IPD error indicates that previously queued operations by this processor may be suspect. The processor is reset in order to perform IPD recovery. This involves notifying the operating system that the task at hand must be aborted and retired. For the checkstop case, a core has the risk of stopping in a state in which CP sparing is not possible. Thus, techniques are needed to handle processor recovery in multi-core environments.