1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a method and apparatus for resolving livelock in a data processing system.
2. Description of Related Art
Deadlock is a situation where two or more processes in a data processing system are unable to proceed because each is waiting for one of the others to do something. A common example is a program communicating to a server that may be in a state of waiting for output from the server before sending anything more to the server, while the server is similarly waiting for more input from the controlling program before outputting anything. This particular type of deadlock is sometimes called a “starvation deadlock,” although the term “starvation” is more properly used for situations where a program can never run simply because it never gets high enough priority.
A livelock is similar to a deadlock, except that the state of the two processes involved in the livelock constantly changes with regards to the other process. As a real world example, livelock occurs when two people meet in a narrow corridor, and each tries to be polite by moving aside to let the other pass, but they end up swaying from side to side without making any progress because they always both move the same way at the same time. For example, two or more processing elements may be stuck in loops because each processing element repeatedly reaches a point in the loop where it must tell the other to retry a particular command. A livelock can occur, for example, when a process that calls another process is itself called by that process. A livelock may be caused by malicious code or a software or hardware design bug.
A number of solutions in the prior art are concerned with preventing livelocks in a multiprocessor system. However, despite these efforts, combinations of software sequences and the way the hardware executes it may still conspire to create a livelock anyway.
A multiprocessor system typically provides each processing element with a watchdog timer. If a command begins and the watchdog time expires without any progress being made on the command, the processing element may signal that a “hang” has occurred. A hang is a freezing condition where the processor cannot continue execution. In the prior art, a service processor or control processor may signal a checkstop to the processors upon a hang condition, thus stopping the clock.