1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a system and method for placing a processor into a gradual slow mode of operation. The slow mode of operation of the processor is used to break out of a livelock condition within the processor core.
2. Description of Related Art
In modern processor design, especially in a simultaneous multi-threading (SMT) processor design, livelock conditions are one of the most difficult problems to find and solve during a design phase, and are often not found until later when the design is implemented in hardware. A livelock condition can be described as a processor executing the same set of instructions repeatedly without making any real progress. One of the root causes of getting into a livelock condition is due to the “harmonic” fixed-cycle interactions between one processor unit and another. Other causes of livelock conditions are repeated flushing and re-fetching of instructions and repeated re-issuing from an issue queue and never completing an instruction because of a repeated flush condition occurring on a resource full or busy condition that is unable to be freed-up.
“Harmonic” fixed-cycle interactions are a product of the fact that, in a processor pipeline, a fixed number of cycles are used to process and complete execution of an instruction. Instructions from multiple threads will be processed through the processor pipeline in a harmonic manner such that each instruction in each thread completes processing at substantially the same time. If there are dependencies between threads of execution, such that one thread requires the results of an instruction in another thread of execution, a livelock condition may occur because both instructions are processed through the processor pipeline at the same time and there are resource and dependency conflicts. The dependent instruction will not be able to complete because the result of the instruction in the other thread is not yet available. If the instructions in the other thread encounter a resource conflict, instructions from both threads will be repeatedly flushed and re-issued to the pipeline.
An example of code and a dual instruction issue processor design that results in a livelock condition is shown in FIGS. 6A and 6B. FIG. 6A illustrates a SMT and dual instruction issue processor pipeline design, where two instructions are issued from each thread every other cycle. FIG. 6B illustrates exemplary user code which, when executed on the processor pipeline shown in FIG. 6A, causes both threads to try to access a processor's special purpose register (SPR). Thread0 is coded with a Branch-Not-Equal (bneq) instruction to wait on the Thread1 code to complete its SPR access. The Thread0 code will keep branching back and keep checking whether Thread1's “store” instruction is done.
An SPR queue in the processor pipeline design shown in FIG. 6A can only support two SPR instructions at a time. Thus, the third SPR instruction in Thread1, and all instructions behind the third SPR instruction, will always be flushed and re-issued. If both Thread0 and Thread1's instructions are compiled in the order as shown in Table 1 below, all of Thread1's instructions that are ahead of “mt sprC R1” instruction will be completed. Thread1's “mt sprC R1” will get flushed initially because the SPR queue is busy with “mt sprA R3” and “mt sprB R2.” All the Thread0's instructions that are after “bneq CheckThread1SPR” will get issued and flushed as long as Thread1's “store R4 0x100” is not executed.
TABLE 1Example Code Stream that Creates a Livelock ConditionIssue CycleIssue Slot 0Issue Slot 1Nld R5, 0x100n + 1ld R1, 0x2C00n + 2cmp R4, R5bneq CheckThread1SPRn + 3mt sprA R3mt sprB R2n + 4mt sprD R6mt sprE R7 (Thread0 flush dueto branch mispredition)n + 5mt sprC R1store R4 0x100 (Thread1 flushdue to SPR Queue is full)n + 6Mt sprE R8n + 7b R10
Table 2 illustrates the new code sequence occurring after Thread1's instructions at n+1 and n+3 are completed due to the rest of the instructions in both threads being re-issued, flushed, and re-issued again. As shown in Table 2, Thread1's “mt sprC R1” will again get flushed because Thread0's “mt sprD R6” and “mt sprE R7” are issued ahead of Thread1's “mt sprC R1.” These two “move to SPR” instructions in Thread0 will eventually get flushed because of a branch mis-prediction. These two flush conditions from both threads (Thread0's branch mis-prediction flush and Thread1's third move to SPR instruction) continually flush and the issue unit is in a “harmonic” window. Therefore, both Thread0's and Thread1's instructions in Table 2 will re-issue and flush over and over again. This will force the processor into a livelock condition.
TABLE 2Example Code Stream that Creates a Livelock ConditionIssue CycleIssue Slot 0Issue Slot 1Nld R5, 0x100n + 1n + 2cmp R4, R5bneq CheckThread1SPRn + 3n + 4Mt sprD R6mt sprE R7 (Thread0 flush dueto branch mispredition)n + 5Mt sprC R1store R4 0x100 (Thread1 flushdue to SPR Queue is full)n + 6Mt sprE R8n + 7b R10
Livelock conditions, such as that described above, are often detected when real applications are run on the processor hardware. It is often costly and too late to fix the design at this point. Therefore, it is important to have a general method built into the processor core to detect and solve these kind of unexpected livelock conditions that are found in the hardware validation period.
Typically, a hardware-based detection mechanism, which may be hard-coded into the processor design itself, such as in the issue unit of the processor, is provided to detect such livelock conditions. The manner by which the detection mechanism may detect the livelock condition depends upon the particular implementation. For example, the detection mechanism may detect a livelock condition by counting a number of flushes of a particular instruction, or the number of times an instruction re-issues, without completing. Such situations indicate a change of state without forward progress and hence, a potential livelock scenario.
In known designs, in response to detecting the occurrence of a livelock condition using the livelock detection mechanism, the pipeline of the processor is placed into a single-step mode of operation such that only one instruction is completed at a time across one or more threads. However, placing the pipeline of the processor into a single-step mode each time a livelock condition is detected, significantly affects the overall performance of the processor since the livelock condition may occur very often in the execution of a program. This approach is also overkill if this livelock situation has a livelock window, i.e. a period of time in which the livelock condition will continue to be detected in the processor, of only a few processor cycles.