1. Field of the Invention
This invention is related to the field of processors and, more particularly, to parallelizing instruction execution within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A "wide issue" superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a "narrow issue" superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
Unfortunately, supporting wider issue rates generally requires a larger amount of execution hardware within the processor. If sufficient execution hardware is not provided, then the instruction throughput of the processor may suffer even though the processor is capable of issuing a large number of instructions concurrently. The execution hardware may occupy a substantial amount of semiconductor substrate area, increasing the overall die size of the processor and hence its cost.
Additionally, many instructions are relatively simple instructions which could be handled by simple execution hardware. For example, move instructions which specify only register operands (i.e. a move from a source register to a destination register) is a simple instruction requiring almost no hardware to execute. Move instructions having a memory and a register operand involve an address generation but relatively little additional hardware. Furthermore, additive instructions (e.g. add/subtract/increment/decrement) having register operands are relatively simple instructions as well. The simpler instructions may be relatively frequent within common code sequences as well. However, the execution hardware must also be capable of executing the more complex instructions. Some superscalar processors have attempted to provide less costly execution hardware by providing both complex and simple execution units and controlling the issue of instructions to the execution units such that the simple execution units receive only simple instructions while the more complex units receive either simple instructions or complex instructions. While such a strategy may reduce the area occupied by the execution hardware, the issue logic becomes more complex. The complex logic may occupy more area, or may become a clock cycle time limiter. Accordingly, a more efficient method for handling the mix of simple and complex instructions is desired.
In order to support higher clock frequencies (i.e. shorter clock cycle times), superscalar processors have been employing longer pipelines (i.e. pipelines including more stages) as well as wider issue rates. While longer pipelines may result in the achievement of higher clock frequencies, the longer pipelines present additional design challenges as well. More particularly, as greater numbers of instructions may be fetched and placed into the pipeline prior to previous instructions completing execution, additional forwarding hardware may be required to support parallel execution. For example, more instructions may progress beyond the operand fetch stage prior to the execution of previous instructions. If these instructions are dependent upon the previous instructions, the operands for those instructions may not be available when the instructions reach the operand fetch stage. These instructions may be allowed to progress to subsequent pipeline stages if forwarding hardware is provided to route the operands to the instructions as they progress through the pipeline toward execution. Unfortunately, the forwarding hardware may be costly in terms of area and complexity as well. A more efficient solution for providing operands to dependent instructions is therefore desired.
As used herein, the term "dependency" is used to refer to relationship between a first instruction and a subsequent second instruction in which the second instruction requires execution of the first instruction prior to execution of the second instruction. For example, the second instruction may include a source operand which is generated via execution of the first instruction. Generally, an operand is a value operated upon during execution of an instruction. The operands for a particular instruction are located via operand specifiers encoded into the instruction. For example, certain operands may be stored in registers employed within the processor. A register operand specifier encoded into the instruction selects the particular register storing the operand. The register operand specifier may also be referred to as a register address or a register number. On the other hand, other instructions may specify a memory operand stored in a memory location within a main memory to which the processor is coupled. The memory address is specified via operand specifiers as well. For example, the instruction may include a displacement which identifies the memory location storing the memory operand. Other instructions may include address operand specifiers which specify register operands used to form the memory address. An operand may be a source operand if the operand is an input value for the instruction. An operand may be a destination operand if the operand is the result of the instruction. The destination operand specifier specifies the storage location in which the result of executing instruction is to be stored.