1. Technical Field
The present invention generally relates to design structures and in particular to design structures for dynamic livelock resolution in processor systems.
2. Description of the Related Art
To increase microprocessor performance, microarchitectures and memory subsystems employ a variety of techniques which allow multiple instructions to execute simultaneously. Superscalar instruction issue and speculative execution are two strategies that improve performance but also significantly increase overall system design complexity.
Occasionally, during instruction execution, a situation occurs whereby instructions are repeatedly issued, but a recurring transient conflict inhibits the forward progress of the execution of the instructions. This condition is called a system livelock, and may be caused by any one of a number of conflict-generating instruction execution sequences. With the addition of system bus interactions (e.g., snooped operations) and multiprocessor shared-memory architectures in conventional processing systems, the occurrences of livelocks are even more likely. In conventional systems, system livelock is typically caused by one of the following conditions: (a) repeated flushing of instructions as a result of structural hazards that consequently cause the instructions to be speculatively refetched or repeatedly being re-issued from an issue queue; and (b) “harmonic” fixed-cycle interactions between multiple processing units, where one unit inhibits the forward progress of another.
A repeated flush livelock condition commonly occurs when a “full” or “busy” resource, such as an ERAT, SPR, LMQ, STQ, etc., is unable to receive the instruction (or associated request) due to the repeated execution of a particular instruction or sequence of instructions. The above acronyms are defined as follows: ERAT—Effective to Real Address Table; SPR—Special Purpose Register; LMQ—Load Miss Queue; and STQ—Store Queue.
A harmonic livelock condition results when an instruction is repeatedly discarded. The condition that causes the instruction to be discarded is triggered where (a) an instruction enters the pipeline just before the required resource becomes available and (b) the processor changes state such that the resource is no longer able to become available when the instruction reaches that resource. This two step process then results in a harmonic livelock when certain conditions cause the above two step process to repeat indefinitely.
Execution of the code sequence below may provide a catalyst for the conditions that result in a harmonic livelock.                load A        store A        (several stores . . . )        store B (store queue full flush)        load C (flushed along with store B)        
As provided, a load from cache line A (referred to as “ld A”) is followed by several stores, including a store to cache line A (referred to as “st A”). In this example, the load misses the cache so the subsequent store to the same address is placed into the store queue, waiting for the load to be serviced so that correct in-order memory access to the same address will be preserved. More stores are issued, thus filling up the store queue. After the store queue becomes full, the store to cache line B (“st B”) is issued. This store and all younger instructions are flushed because the store queue has no available entries.
Ld C has the same address subset for indexing the cache arrays (i.e. the same congruence class address) as ld A. In this example, the load from cache line C (“ld C”) following st B was speculatively issued and sent to the memory subsystem before the store queue conflict was determined. In the case where ld A is rejected because of a collision with a previous load or store shortly before the ld C request was sent to the memory subsystem controller, the design of most conventional memory queues allow the possibility that ld C may be accepted before ld A. Due to memory access restrictions to the same cache congruence class, the memory servicing of ld C, which is accepted out of order by the memory controller, now presents a new restriction that inhibits the servicing of ld A.
Under normal operation, this method of age independent load handling provides a performance boost because this handling enables out of order instruction execution in the absence of data dependencies. However, side effects of this enhancement include unexpected problematic circular conflicts. In the above example, the ld C instruction, which blocked ld A, is flushed as a result of the st B flush. The st A instruction cannot be serviced because ld A was blocked by ld C. Once ld A is blocked, ld A is sent to the retry delay queue in the memory subsystem. A livelock condition may occur when the st B and ld C instructions are speculatively re-issued. Ld C is sent to memory subsystem controller before ld A has time to pass through the retry delay queue and attempt a memory access. Again, ld A is blocked by ld C due to the congruence class conflict. Without some intervention, this process will repeat indefinitely.
The above execution process typically occurs in a conventionally designed processor system, such as that illustrated by FIG. 1. FIG. 1 depicts the configuration of a conventional processor and memory subsystem which are utilized for handling of basic instruction processing and livelock conditions, according to the prior art. FIG. 1 shows conventional load and store (Ld/St) queue 400 that serves as the staging and retry delay queue between processor core 100 and memory subsystem. The retry delay queue includes a series of sequential stages (illustrated as latches) 410, 420, 425, and 430. When Ld/St queue 400 is empty, a new request takes bypass path 405 and the request is forwarded to Memory Subsystem Controller 500. If this request is denied by memory subsystem controller 500 due to a resource conflict, the request enters first stage 410 of the retry delay queue using path 455 and travels through each stage (410, 420, 425, and 430) of retry delay queue 455. Subsequent loads or stores from the processor bypass retry delay queue 455 unless a previously delayed retry entry has reached final queue stage 430. When the previously delayed entry is in final queue stage 430, the retry request is again sent to Memory Subsystem Controller 500, while the new request enters the delay queue at stage 410 using path 407. Retry delay queue 455 provides a reasonably fair retry scheme with a pipeline that forces an instruction to wait for its conflict to clear. Waiting for any conflicts to clear then allows memory subsystem controller 500 to service new requests destined for unoccupied resources.
As described above, the processing system of FIG. 1 utilizes a fixed-duration retry delay queue. When the above sequence is executed within the processing structure of FIG. 1, several potential downsides are observed with regards to handling a harmonic livelock. Executing the example execution sequence above, ld A is rejected by memory controller 500 and placed in top of the retry delay queue at stage 410. At some time before ld A can be successfully serviced, Id C enters queue structure 400, proceeds to memory subsystem controller 500 via bypass path 405, and is accepted. The timing of this occurrence is purely by chance, but its occurrence has been demonstrated in real systems.
The servicing of ld C provides an address collision conflict which causes memory subsystem controller 500 to again reject ld A when ld A reaches the end of the queue. As described earlier, the st B instruction preceding id C is flushed causing the results of ld C to be discarded. In response to the flush, the processor core immediately retries the st B and ld C instructions, expecting the resource conflict to be resolved. Again, the ld C instruction bypasses the ld A instruction which has returned to the retry delay queue, and thus, continues to hinder the progress of ld A. A livelock occurs because instructions are repeatedly issued (st B, ld C) but the blockage of ld A caused by ld C prohibits the possibility of freeing store queue entries and thus, prohibits forward progress.
A similar livelock condition may occur when multiple threads in a Simultaneous Multithreading (SMT) processor try to access a shared resource. SMT processors alternate between multiple threads when selecting instructions to dispatch. A harmonic livelock condition may occur where one thread accesses a resource in the cycle before a second thread tries to access the same resource. The second thread is flushed because the resource is occupied by the first thread. If the first thread's progress is dependent on a result from the second thread, the system will experience livelock because forward progress is impossible when the first thread repeatedly blocks the second thread. The risk for livelocks is further increased when multiple processors share the same secondary memory system.
Livelock conditions are usually hard to predict and recreate and/or identify in simulation. The software execution bugs that cause livelocks are often found later in the hardware validation process. Breaking out of unanticipated livelock conditions presents a difficult challenge for the design of high performance microprocessors. However, designs which include advanced livelock avoidance features may save significant test and redesign expenses. Therefore, backup mechanisms are often included within a processor core. These backup mechanisms are designed to dynamically break livelock conditions.
Designing livelock correction mechanisms requires careful analysis to cover all unforeseen potential livelock scenarios. Several proposed solutions for livelock correction primarily focuses on one of (1) bus accesses between multi-processor systems, including specific changes to writeback protocols in anticipation of livelocks [U.S. Pat. No. 6,279,085], (2) distributed synchronization and delay management of snoop requests [U.S. Pat. Nos. 6,523,076 and 6,968,431], and (3) the implementation of random arbitration schemes [U.S. Pat. No. 5,761,446]. Other solutions focus solely on data sharing livelocks [U.S. Pat. No. 6,078,981]. However, none of these proposed methods resolves the different types/forms of livelocks in an efficient manner.
Given the above problems presented by the occurrence of livelocks, the present invention recognizes that it would be desirable to provide a mechanism to efficiently resolve and reduce system livelocks within a data processing system.