This invention relates generally to computer systems and more particularly to reordering of instructions in pipelined processors.
As it is known in the art, computer systems have become ubiquitous. In particular, one type of computer system widely employed is the so-called pipelined processor. In a pipelined processor, processor actions are decomposed into a plurality of steps in order to increase throughput. As an example of pipelining, a pipelined instruction stage decomposes instructions into assembly like stages. Illustratively, a pipeline stage includes an instruction fetch stage in which instructions are fetched in one or several cycles from a cache memory, and instruction decode stage in which the op code of the instruction, that is the portion of the code which determines the function of the instruction is examined to ascertain the function as well as the resources needed by the instruction. Illustrative example of resources may include general purpose registers within the CPU, access to internal buses and external buses, and functional units such as I/O units and arithmetic logic units and so forth. A third stage is typically the instruction issue stage in which resource availability is checked for each instruction and resources are reserved for particular instructions. Generally during the instruction issue stage operands are read from registers during the issue stage. The fourth stage of a typical parallel pipeline computer system is the execution stage in which instructions are executed in one or several pipelined execution stages typically writing results into general purpose registers during the last execution stage.
In an ideal pipeline processor, time is measured in CPU clock periods. In theory, the clock period for a P-stage pipeline would be 1/P of the clock period for a non-pipelined equivalent since the non-pipeline equivalent would have P-1 less stages of execution for the instruction. Thus, with the pipelined approach there is the potential for a P times increase in throughput or performance improvement over a conventional non-pipelined architecture.
There are several practical limitations on pipeline performance however which prevents a pipeline processor from achieving a times P throughput improvement.
One particular limitation on practical performance are instruction dependencies. Instruction dependencies occur when instructions that are next to execute depend upon the results of previous instructions which have not yet finished executing. Therefore the instructions that are next to execute have to wait for the previous instructions to complete before they can proceed through the pipeline.
Two types of instruction dependencies may be identified. The first one is the so-called data dependency which occurs when instructions use the same input and or output operands as for example when an instruction uses the result of a preceding instruction as an input operand. A data dependency may cause an instruction to wait in the pipeline for the preceding instruction to complete. A control dependency on the other hand, occurs when a control decision such as for example: a conditional branch decision must be made before subsequent instructions can be executed.
When an instruction dependency occurs, all instructions following the instruction being executed are blocked from executing and typically the instruction in the pipeline is stalled in the instruction pipeline stage and does not proceed further until the instruction ahead of it proceeds or at the issue stage until all resources and data dependencies for the particular instruction are satisfied. When a large number of instruction dependencies occur in a program executed by a pipeline processor, the performance advantages implicit in a pipeline processor are reduced accordingly.
One technique known in the art to overcome the occurrence of instruction dependencies is so-called instruction scheduling. An important characteristic of pipeline processors is that by using equivalent but reordered code sequences, the pipeline processor can provide an improved performance difference by eliminating many of the so-called instruction dependencies. This is often done in an instruction scheduler in which the instructions are reordered by some algorithm to eliminate register conflicts that appear to the hardware as dependencies and to reduce the occurrence of data dependencies by executing instructions in a rescheduled order.
One class of instructions which present particular problems are memory reference instructions. Memory reference instructions are those instructions which cause data to be sent to or retrieved from a memory device such as cache memory or main memory. The problem associated with reordering memory reference instructions is that referred to as data dependencies. There are three types of dependencies. One type is the so called store--store dependency which occurs when a series of instructions produce values that are to be written to the same memory location and thus must be written in proper order. A second type of dependency is the so called store-load dependency which occurs when a value has to be stored in a memory location before the value can be loaded from the location. The third type of data dependency is the so--called load-store dependency which arises when a value must be loaded from a memory location before a subsequent store at the memory location.
The general problem with reordering memory reference instructions is that at the time of instruction reordering or rescheduling, generally there is insufficient information available to determine the address of the area in memory which may be affected. This lack of specific information thus makes reordering of memory reference instructions difficult. However, reordering of memory reference instructions would improve processor performance in pipelined processors which include some sort of instruction scheduling.