This invention relates to high performance computer systems, and particularly to Address Generation Interlock (AGI) support in microprocessors that execute a instruction set, usually a complex instruction set computer (CISC) instruction set, which includes multi-cycle instructions that load a number of general purpose registers.
In the art of microprocessor design, one technique to improve performance is the use of “pipelining.” Pipelines improve performance by allowing a number of instructions to work their way through the microprocessor at the same time.
Consider that most processors run programs by loading an instruction from memory; decoding the instruction; loading associated data from registers or memory that is needed to process the instruction; processing the instruction; and storing any associated results in registers or memory. Pipelines are usually characterized in terms of their depth (N) (i.e. processing stages) with N=5 in this case. Complicating this series of steps is the fact that access to the memory, which includes a memory hierarchy that can be made up of caches, main memory (i.e., random access memory) and other memory such as non-volatile storage like hard disks, (not shown) involves a lengthy delay (in terms of processing time/cycles).
If each of these steps of running programs is implemented as a pipeline stage, then the microprocessor may start to decode a new instruction while an older instruction waits for results to continue. This permits up to N instructions to be “in flight” at one time, making the microprocessor appear to be up to N times as fast. Although any one instruction takes just as long to complete (there are still N steps) the microprocessor as a whole “retires” instructions much faster. Since each stage involves less work, and thus requires less time, a processor with more stages can usually be run at a higher clock speed.
Multiple instructions can occupy different stages in an execution unit at the ideal back-to-back rate leading to increased throughput and overall performance. While effective, pipelining is unfortunately limited by two major factors. A first factor is an extent that the pipeline can be supplied with new instructions to process and is essentially a factor of restart (e.g. branch wrong, exception conditions including I/O interrupts) penalties. A second factor is the amount of resource interdependency in the instruction stream.
With regard to the second factor, consider that during processing an operation for an instruction in one stage (i.e., a “consumer”) of the pipeline may be dependent on results from another instruction (i.e., a “producer”). The other instruction may be an earlier instruction, executing in a later stage of the pipeline. Ideally, results are produced and available for consumption when needed, otherwise, the pipeline will suffer a stall at the point of the consumption until the results are ready. As an example, cycles between the producer and the consumer of the result should be occupied by operations of other independent instructions such that the pipeline stages are all filled and the result is ready when it is needed. Modern compilers employ instruction re-ordering techniques to improve the spacing of dependent instructions, but this is limited by the instruction stream (within a software program) itself.
Instructions that occupy or update multiple resources are the most likely to cause interdependencies which adversely affect performance since their corresponding operations tend to be longer. One example of such an instruction is a Load Multiple (LM) type of instruction which writes a plurality of General Purpose Registers (GPRs) using data located in a range of sequential addresses. Because the data fetching sequence can be potentially long and involves a large number of registers, known solutions in this area involve only tracking the updates of actual architectured registers representing the GPRs. Tracking of the GPR updates permits the dispatch of younger dependent instructions only when the results are available in the GPR. Tracking may be accompanied by use of Address Generation Interlock (AGI) techniques to delay the consumer instruction(s) in the Address generation (AGEN) stage until the data is available in the GPR.
Address Generation Interlock (AGI) detects and resolves read-after-write (RAW) dependencies where an instruction writes a general purpose register, which is later read by a younger instruction. The younger instruction may access the general purpose register during an address generation (AGEN) stage to provide value required to calculate the operand address for operand access. In a typical microprocessor pipeline, the address generation stages are usually earlier and possibly decoupled from the execution stages. Therefore, an AGI penalty may be realized where younger instruction's operand address generation becomes dependent on older instruction's execution write back. It is also common in a in-order pipelined microprocessor that the actual GPR update is performed some pipeline stages after their intended update values are calculated during execution. One such reason is because an in-order processor does not have renaming capability, and such updates, even with results ready, need to only happen when the execution path is confirmed to be unconditional, i.e., it is not under any unresolved branches. It is these cases where results are ready, the ability of bypassing results to a consuming address generation stage before the GPRs are written becomes very important for performance. In a microprocessor where long or complex instructions are not cracked into individual micro-instructions during decode or dispatch, such as for LM-type instructions, there is added complexity in tracking and supporting the bypass of individual GPR updates during execution to address generation.
Therefore, what are needed are techniques for improving result handling in a microprocessor. Preferably, the techniques provide for reduction of stall or wait conditions by facilitating earlier use of instruction results.