This invention relates generally to computer systems and more particularly to processing of instructions in pipelined multiple execution processors.
As it is known in the art, computer systems have become ubiquitous. In particularly, one type of computer system widely employed is the so-called pipelined processor. In a pipelined processor instructions are decomposed into assembly like stages. Illustratively, a pipeline includes an instruction fetch stage in which instructions are fetched in one or several cycles from a cache memory; An instruction decode stage in which an instructions op code, that is the portion of the code which determines the function of the instruction, is examined to ascertain the function of the instruction and thus the resources needed by the instruction. Illustrative resources may include general purpose registers within the CPU access to internal buses as well as external buses and functional units such as I/O units and arithmetic logic units and so forth. A third stage is typically the instruction issue stage in which resource availability is checked for each instruction and resources are reserved for particular instructions. The fourth stage of a typical parallel pipeline is the execution stage in which instructions are executed in one or several execution stages, writing results into the general purpose registers during the last execution stage.
In an ideal pipelined processor, time is measured in CPU clock periods. In theory, the clock period for a P-stage pipeline would be 1/P the clock period for a non-pipelined equivalent since the non-pipeline equivalent would have P-1 less stages of execution for the instruction. Thus, with the pipelined approach there is the potential for a P times improvement in throughput or performance over a conventional non-pipelined architecture.
There are several practical limitations on pipeline performance however which prevents a pipelined processor from achieving the P times throughput improvement. One particular limitation on practical performance is instruction dependencies. Instruction dependencies may be viewed as instructions which depend on the results of previous instructions and may therefore have to wait for the previous instructions to complete before they can proceed through the pipeline. Two types of instruction dependencies may be identified.
The first one is the so-called data dependency which occurs when instructions use the same input and or output operands as for example when an instruction uses the result of a proceeding instruction as an input operand. A data dependency may cause an instruction to wait in the pipeline for a preceding instruction to complete. A control dependency on the other hand, occurs when a control decisions such as for example a conditional branch decision must be made before subsequent instructions can be executed.
When an instruction dependency occurs, all instructions following the instruction being executed are blocked from executing and typically the instruction in the pipeline is stalled in a pipeline stage and does not proceed further until the instruction ahead of it proceeds, or at the issue stage until all resources and data dependencies for the particular instruction are satisfied. When a large number of instruction dependencies occur in a program executed by a pipeline processor, the performance improvement implicit in a pipeline processor are reduced accordingly.
One technique known in the art to overcome the occurrence of instruction dependencies is so-called instruction scheduling. An important characteristic of instruction scheduling in pipelined processors is that by using equivalent but reordered code sequences, the pipeline processor can provide an improved performance by eliminating many of the so-called instruction dependencies. This is often done in an instruction scheduler in which the instructions are reordered by some algorithm to eliminate register conflicts that appear to the hardware as dependencies and to reduce the occurrence of data dependencies by executing instructions in a rescheduled order.
With pipelined computer systems in general, there is a delay until actions are taken while the various pipelines within the computer system are filled. Thus, it is desirable to keep the pipeline full so that each stage is continuously processing instructions.
To maintain the pipeline full, the pipeline processor often requires circuits to provide predictions such as which sequence of instructions to follow when processing a branch type of instruction. If the prediction concerning the instructions sequence is correct executions of the instruction follows without incident. However if a wrong path of the instruction sequence is followed after a conditional branch instruction for example then the pipeline must be backed up to the conditional branch instruction and the instructions of the correct path are required to be loaded into the pipeline. For execution, it is required that all of the operands that were previously used in the pipeline at the time prior to the execution of the of the conditional branch must be restored to the pipeline.
One of the problems that arises with the backing up of the pipeline is that the values of the registers used in the execution of the instructions in general will change between the time the conditional branch instruction is performed and the time the error condition which indicated that the conditional branch taken was incorrect is recognized.
In modern computers, registers are used to store values of operands. A typical instruction processed by a pipeline will add the values stored in two different registers and place the resulting value in one of the two registers or a third register. For example, an instruction may be "add R7, R8&gt;R9". This instruction is interpreted to mean that the contents of register 7 and the contents of register of 8 are added together and the result is placed in the register 9.
The problem with pipelined systems, conditional branches, and the use of registers is that often a subsequent-instruction called for use of the register that was changed since the early instruction passed through the pipeline. Thus, as an example, assuming a latter or subsequent instruction following the instruction "add R7, R8&gt;R9" is an instruction add R1, R2&gt;R7. By executing the second subsequent instruction the contents of register 7 have been changed. If it is necessary to back up the instruction stage to the first instruction add R7, R8&gt;R9 as a result of the conditional branch or other condition in the central processing unit, the contents of R7 which were changed by the second subsequent instruction add R1, R2&gt;R7 will provide an incorrect result. Thus, in general it would not be possible to back up the instruction pipeline to the instruction add R7, R8&gt;R9 and provide a valid result in register R9. That is, if the first instruction in which the contents of register 7 and register 8 are added were to be performed after execution of the second instruction due to a back up of the pipelined processor the resulting value stored in R9 would be different than the valve which should have been stored in R9.
As it is also known in the art, in pipeline computer systems many instructions can be processed through the pipeline before a misprediction or other trapped condition is recognized by control logic in the pipeline computer. The value contained in any one of the registers which may be used by-the set of instructions may have thus changed may times before the trapped condition is recognized. Therefore there is a need to save results generated at certain pipeline stages until the complete pipeline has run its course in the event that a trap or other condition occurs which requires the pipeline to be backed up to a previous known state to execute a previous prior executed instruction.
One approach for saving the results is "register renaming." In any processor execution box there are a plurality of registers for example 32 which can be used by instructions executed by the pipeline. These registers are referred to as logical registers. With register renaming, each logical register is mapped into a physical home located in a pool of registers which is larger than the number of logical registers. Each result generated for a logical register is given a new physical home in the pool of registers. Typically, the pool of registers is kept in a register file. Thus, multiple generations of the same logical register will be available and held in different physical homes in the register file. In order to implement a register renaming scheme an apparatus must be provided to determine what physical home holds the latest value for each logical register. This apparatus is generally referred to as a register map.
In addition, it is possible to restore the state of the register map to a state that the map was in at some prior time to be able to effectively back the pipeline up to that state and process instructions again from that state.
One further problem with pipelined processors is the latency time between the time an instruction is fetched and the time that an instruction is sent to an execution box. With an instruction scheduler and a register mapper generally instructions are scheduled with registers already mapped. One problem with this approach however is that there are a large number of cycles through a typical register mapper and instruction scheduler. This increases the time required to go throught this portion of the instruction processor. It is desireable to shorten this time to improve overall processor performance, particularity for replays and restarts of the instruction processing in order to maintain the performance advantages of the pipelined processor.