The technical field relates generally to digital computer systems and more particularly, but not by way of limitation, to systems for detecting errors within the instructions processed in such computer systems.
A central processing unit (CPU) may stop making forward progress for various reasons. For example, a CPU deadlock may occur when the code makes a memory reference to a non-existing memory. In some systems, the memory controllers will not respond to such an erroneous memory reference, causing the system to deadlock, waiting for data to return from a memory that does not exist. When a CPU deadlock occurs, there must be some mechanism for releasing the CPU from this deadlocked state.
One such mechanism is the triggering of a bus error to clear the deadlock. However, triggering a bus error substantially impacts the system by requiring the system to be restarted. In particular, triggering a bus error requires resetting the memory controllers. Triggering a bus error is expensive in terms of time and software required to fix the problem. A bus may have multiple CPUs, in which case all of them usually must be reset upon the triggering of a bus error.
What is needed is method and an apparatus to resolve the CPU deadlock without triggering a bus error, if possible. In particular, what is needed is a method of attempting to resolve the CPU deadlock first through software, and then, if that method fails, invoking traditional methods of resolving the deadlock, such as triggering a bus error.
A method is provided for handling errors that deadlock a CPU by first attempting to resolve the deadlock without issuing a bus error and without restarting the computer. If the deadlock cannot be resolved without issuing a bus error, then a bus error is issued and the computer attempts to restart itself. The method involves comparing the number of clock cycles taken to execute an instruction to a designated abort value. When the instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock. The method also involves comparing the number of clock cycles to a larger bus error value. If the MCA does not break the deadlock within a certain periodxe2x80x94i.e., before the bus error value is reachedxe2x80x94then a bus error is issued and the computer attempts to reset.
A computer system includes a CPU, a counter, and a software programmable register. The counter determines the number of clock cycles consumed during the execution of an instruction and stores that number in the register. The number of clock cycles taken is compared to execute an instruction to a designated abort value. When an instruction has taken the full abort value of cycles but has not retired, a machine-check abort (MCA) is issued to attempt to resolve the deadlock. The number of clock cycles is also compared to a larger bus error value. If the MCA does not break the deadlock within a certain periodxe2x80x94i.e., before the bus error value is reachedxe2x80x94then a bus error is issued and the CPU attempts to reset itself.