It is crucial to discover the origin of hardware errors. However, it is often not a trivial task because of the quick propagation of errors to different units of the hardware. A large volume of signals must be stored for building an error history to help understand the chip's behavior under debugging. In an error prone system, it is likely that multiple error scenarios occur when or after a first error is recovered. It is difficult to access debugging data in multiple error scenarios. The error information is usually stored in a register of definite size. The size of this register is chosen according to economical considerations and/or space limitations. Therefore, it is usually not possible to store a large amount of error debugging data in these registers. As a result, tracking the origin of the errors can be time consuming and frustrating.
Advanced error checking and reporting structure allows identifying the root cause of an error. Each processor unit has numerous error checkers, which can be analyzed after an error has occurred. However, if the errors are found recoverable, the error report structures are cleared with the recovery process. Usually, if due to other interactions, the recovery process does not solve the problem, the recovery process is repeated a number of times. After a given threshold, an error, which was thought to be recoverable, can be escalated to checkstop.
The recoverable errors eventually either lead to successful recovery or lead to checkstop. Currently, in a multiple-error scenario in a sequence without forward progress, only the last error is available and can be analyzed. The information on previous errors is lost.
A prior approach provides useful debugging information over three recovery actions with no forward progress. A stack of combined recovery state information and error information is described. However, the access to the collected data is different. A register table is used to provide useful debugging information over three recovery actions with no forward progress. Such a solution allows getting recovery information but only on the last error indications. The current invention is aimed at storing more error information in the register for debugging purposes.