1. Field of the Invention
This invention relates generally to the field of microprocessors and, more particularly, to execution units within microprocessors.
2. Description of the Related Art
Microprocessors are typically designed with a number of xe2x80x9cexecution unitsxe2x80x9d that are each optimized to perform a particular set of functions or instructions. For example, one or more execution units within a microprocessor may be optimized to perform memory accesses, i.e., load and store operations. Other execution units may be optimized to perform general arithmetic and logic functions, e.g., shifts and compares. Many microprocessors also have specialized execution units configured to perform more complex floating-point arithmetic operations including multiplication and reciprocal operations. These specialized execution units typically comprise hardware that is optimized to perform one or more floating-point arithmetic functions.
Many instructions in a microprocessor are configured to perform a function based on one or more operands. These operands may originate from a variety of sources including registers, a cache, or a main memory. Operands that originate from a cache or main memory often involve operand latencies in obtaining the operands from the cache or main memory. The operand latencies can translate into instruction latencies for instructions that depend on the operand. It is generally desirable to reduce these operand and instruction latencies in a microprocessor to achieve increased processor efficiency.
Instructions that are configured to load data into a destination operand are often referred to as load instructions. Load instructions typically specify a memory location as a source operand and copy data from the memory location into a destination operand. At times, the destination operand of a load instruction will be used as a source operand of an instruction subsequent to the load instruction. The source operand of the subsequent instruction can create a dependency on the destination register of the load instruction. As a result, the subsequent instruction may be required to wait until the load instruction executes or completes to access the contents of the destination register of the load instruction. The time that the instruction waits for the load instruction to execute or complete can result in an instruction latency. It would be desirable to reduce the latencies associated with instructions that specify source operands that depend on the destination operand of a load instruction.
The problems outlined above are in large part solved by an apparatus and method in described herein. Generally speaking, an apparatus and method for superforwarding load operands in a microprocessor are provided. An execution unit in a microprocessor is configured to receive a load instruction and a subsequent instruction. If the load instruction corresponds to a simple load instruction, a destination operand of the load instruction can be superforwarded to a subsequent instruction if the subsequent instruction specifies a source operand that depends on the destination operand of the load instruction. The subsequent instruction is not required to wait until the load instruction executes or completes and can be scheduled and/or executed prior to or at the same time as the load instruction. Consequently, latencies associated with operand dependencies may be reduced.