The present invention relates to hardware recovery in a multi-threaded processor, and more particularly to a processor configured to detect errors in one thread and restore the thread while allowing other threads to continue executing uninterrupted.
Hardware recovery has been used to restore a processor to a known good or safe state after an error occurs. During a recovery process, which may last for thousands of CPU cycles, a processor first detects an error occurrence, stops executing an instruction stream, clears out an internal corrupted state, restores itself to a known error-free state, and restarts instruction processing from a point where the instruction last halted. However, during the recovery process, program flow is interrupted as the corrupted state is cleared and a known good state (or a hardware checkpoint state) is restored. Such hardware-based process keeps the error recovery transparent from the software application/operations.
Processors may be configured to execute one thread of instructions at a time or multiple threads at the same time. Processors configured to execute multiple threads simultaneously are said to be in simultaneous multithreading (SMT) mode. In simultaneous multithreading mode, hardware resources are shared among multiple software threads executing on a machine. Furthermore, in superscalar processors, multiple execution pipelines may be shared among the threads being dispatched into the hardware. Though SMT provides an efficiency of hardware by allowing multiple threads to rapidly share the execution resources available, it comes with a performance cost of the individual threads since resource contention issues may arise between simultaneously-executing threads. Conventional hardware error recovery that works on a processor does not work well on processors running multithreading as any error detected requires that the recovery process be applied to all the running threads although the error may be isolated to a single running thread.