1. Field of the Invention
This invention is related to the field of processors and, more particularly, to instruction scheduling mechanisms in processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by issuing and executing multiple instructions per clock cycle and by employing the highest possible clock frequency consistent with the design. One method for increasing the number of instructions executed per clock cycle is out of order execution. In out of order execution, instructions may be executed in a different order than that specified in the program sequence (or xe2x80x9cprogram orderxe2x80x9d). Certain instructions near each other in a program sequence may have dependencies which prohibit their concurrent execution, while subsequent instructions in the program sequence may not have dependencies on the previous instructions. Accordingly, out of order execution may increase performance of the superscalar processor by increasing the number of instructions executed concurrently (on the average).
Unfortunately, scheduling instructions for out of order execution presents additional hardware complexities for the processor. The term xe2x80x9cschedulingxe2x80x9d generally refers to selecting an order for executing instructions. Typically, the processor attempts to schedule instructions as rapidly as possible to maximize the average instruction execution rate (e.g. by executing instructions out of order to deal with dependencies and hardware availability for various instruction types). These complexities may limit the clock frequency at which the processor may operate. In particular, the dependencies between instructions must be respected by the scheduling hardware. Generally, as used herein, the term xe2x80x9cdependencyxe2x80x9d refers to a relationship between a first instruction and a subsequent second instruction in program order which requires the execution of the first instruction prior to the execution of the second instruction. A variety of dependencies may be defined. For example, an operand dependency occurs if a source operand of the second instruction is the destination operand of the first instruction.
Generally, instructions may have one or more source operands and one or more destination operands. The source operands are input values to be manipulated according to the instruction definition to produce one or more results (which are the destination operands). Source and destination operands may be memory operands stored in a memory location external to the processor, or may be register operands stored in register storage locations included within the processor. The instruction set architecture employed by the processor defines a number of architected registers. These registers are defined to exist by the instruction set architecture, and instructions may be coded to use the architected registers as source and destination operands. An instruction specifies a particular register as a source or destination operand via a register number (or register address) in an operand field of the instruction. The register number uniquely identifies the selected register among the architected registers. A source operand is identified by a source register number and a destination operand is identified by a destination register number.
In addition to operand dependencies, one or more types of ordering dependencies may be enforced by a processor. Ordering dependencies may be used, for example, to simplify the hardware employed or to generate correct program execution. By forcing certain instructions to be executed in order with respect to other instructions, hardware for handling consequences of the out of order execution of the instructions may be omitted. For example, if load memory operations are allowed to be performed out of order with respect to store memory operations, hardware may be required to detect a prior store memory operation which updates the same memory location accessed by a subsequent load memory operation (which may have been performed out of order). Generally, ordering dependencies may vary from microarchitecture to microarchitecture.
Scheduling becomes increasingly difficult to perform at high frequency as larger numbers of instructions are allowed to be xe2x80x9cin flightxe2x80x9d (i.e. outstanding within the processor). Dependencies between instructions may be more frequent due to the larger number of instructions which have yet to be completed. Furthermore, detecting the dependencies among the large number of instructions may be more difficult, as may be detecting when the dependencies have been satisfied (i.e. when the progress of one instruction has proceeded to the point that the dependency need not prevent the scheduling of another instruction). A scheduling mechanism amendable to high frequency operation is therefore desired.
Additionally, a scheduling mechanism is desired which may handle the large variety of ordering dependencies that may be imposed by the microarchitecture. The ordering dependencies, in addition to the operand dependencies, may result in a particular instruction being dependent upon a relatively large number of prior instructions. Accordingly, a flexible scheduling mechanism allowing for a wide variety of dependencies is desired.
The problems outlined above are in large part solved by a processor employing an instruction queue and dependency vectors therein which allow a flexible dependency recording structure. The dependency vector includes a dependency indication for each instruction queue entry, which may advantageously provide a universal mechanism for scheduling instruction operations. An arbitrary number of dependencies may be recorded for a given instruction operation, up to a dependency upon each other instruction operation. Since the dependency vector is configured to record an arbitrary number of dependencies, a given instruction operation can be ordered with respect to any other instruction operation. Accordingly, any architectural or microarchitectural restrictions upon concurrent execution or upon order of particular instruction operations in execution may be enforced. If, during the development of a processor implementation, it becomes desirable to add additional execution order restrictions (e.g. to simplify the implementation), the additional restrictions may be accommodated by indicating ordering dependencies within the dependency vector. The instruction queues evaluate the dependency vectors and request scheduling for each instruction operation for which the recorded dependencies have been satisfied. The enhanced flexibility may improve the suitability of the instruction queues for a variety of processor implementations.
Broadly speaking, the present invention contemplates a processor comprising a dependency vector generation unit and an instruction queue. The dependency vector generation unit is configured to generate a dependency vector corresponding to an instruction operation. Coupled to receive the dependency vector and the instruction operation, the instruction queue is configured to inhibit scheduling of the instruction operation until each dependency indicated within the dependency vector is satisfied. the dependency vector is capable of indicating dependencies upon an arbitrary number of other instruction operations within the instruction queue.
The present invention further contemplates a method for scheduling instruction operations in a processor. A dependency vector corresponding to each instruction operation is generated. The dependency vector indicates an arbitrary number of dependencies upon other instruction operations in an instruction queue. The dependency vector and a corresponding instruction operation are stored in the instruction queue. Each of the arbitrary number of dependencies indicated by the dependency vector are satisfied, and subsequently the corresponding instruction operation is scheduled (responsive to the satisfying of the dependencies).