1. Field of the Invention
This invention is related to the field of processors and, more particularly, to mechanisms for enhancing parallelism within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a processor perform their intended functions. For example, superscalar processors are typically configured with instruction processing pipelines which process instructions. The processing of instructions includes the actions of fetching, dispatching, decoding, executing, and writing back results. Each action may be implemented in one or more pipeline stages, and an instruction flows through each of the pipeline stages where an action or portion of an action is performed. At the end of a clock cycle, the instruction and the values resulting from performing the action of the current pipeline stage are moved to the next pipeline stage. When an instruction reaches the end of an instruction processing pipeline, it is processed and the results of executing the instruction have been recorded.
A problem associated with executing a large number of instructions concurrently is that instructions often have dependencies on instructions prior to them in program order. As used herein, the term "dependency" or "dependent" refers to the condition in which an instruction receives the result of executing a previous instruction as one of its operands. In other words, the dependent instruction operates on the result of the previous instruction. Generally speaking, instructions which are dependent on a previous instruction do not execute in parallel with that previous instruction. Instead, the previous instruction executes, the result is forwarded to the dependent instruction, and the dependent instruction executes in a subsequent clock cycle.
In many cases, instruction dependencies limit the number of instructions which may be executed in a given clock cycle. It is not uncommon in programs for a particular instruction to be dependent on an instruction immediately prior to the particular instruction or to be dependent on an instruction two instructions prior to the particular instruction. Further, it is not uncommon in programs for a majority of the instructions to be dependent in this way. This type of program severely limits the number of instructions which may be executed concurrently.
While a problem in any instruction set, programs written using the x86 instruction set (also referred to as IA-32 or APX) frequently are even more sensitive to the problem of dependencies limiting parallelism. For example, due to the relatively small number of registers available in the x86 instruction set, many operands are stored on a memory stack pointed to by the ESP register. Accordingly, the ESP is an operand of many instructions. Furthermore, many instructions update the ESP register as well (e.g. pushing and popping values on the stack). Accordingly, instructions may exhibit a chain of dependencies on the ESP register, limiting overall concurrent execution. Other instruction sets may not specify a dedicated stack pointer register such as ESP, but software may employ a stack model in which a register is used as a stack pointer register. Such a model may exhibit chained dependencies as well.
As used herein, an "operand" is a value operated upon by an instruction. Source operands are input values to be operated upon in response to the instruction to produce a result, which is the destination operand. Operands may be register operands if they are stored in registers internal to the processor, or memory operands if they are stored in a memory location external to the processor. Register operands are specified by a register address which may be directly encoded in the instruction or may be implicit in the definition of the instruction assigned to a particular opcode. Memory operands are specified by a memory address which may be specified via one or more address operands of the instruction (e.g. a displacement coded into the instruction, one or more register operands, etc.).