1. Field of the Invention
This invention relates to microprocessors, and more particularly, to a method of data forwarding from a store instruction to a load instruction during out-of-order execution.
2. Description of the Relevant Art
In modern microprocessors, one or more processor cores, or processors, may be included in the microprocessor, wherein each processor is capable of executing instructions of a software application. Modern processors are pipelined, or the processors are comprised of one or more data processing stages connected in series wherein storage elements are placed between the stages. The output of one stage is made the input of the next stage during each transition of a clock signal. Level-sensitive latches may be used as storage elements in a pipeline at a phase-boundary, or a portion of a clock cycle. Edge-sensitive flip-flops may be used as storage elements in a pipeline at a cycle boundary. The amount of execution of an instruction performed within a pipeline stage is referred to as the amount of execution performed by integrated circuits between clock cycle boundaries. Ideally, every clock cycle produces useful execution for each stage of the pipeline.
At times, a data dependency stall occurs between two instructions where an instruction's operand depends on the results of a preceding instruction. A stall can be avoided if the result of the preceding instruction is ready for data forwarding from one pipeline stage to another. The dependent instruction does not need to wait for the result to be written and subsequently read from a register file.
To further increase performance, modern microprocessors may perform multiple issue, dispatch, and retirement of instructions per clock cycle. Also the microprocessor may execute instructions of a software program in a different sequence than the in-order sequence they appear in the program. The retirement of the instructions would remain in-order so that the architecture state would be valid in the case of an interrupt. Data forwarding logic may become more complex and require more computational time due to out-of-order execution of multiple instructions per clock cycle.
Memory accesses, which comprise load and store instructions, are one of the types of instructions that a microprocessor executes. A load instruction accesses a memory location and may copy the contents to a register in a register file, reservation station, and/or a re-order buffer. A store instruction copies the contents of an on-chip register and writes the contents to a memory location. The memory may be a L1, L2, or L3 cache, system memory such as RAM for a single processor or a group of processors in a processing node of a network, or the memory may be a hard disk in a computer system. Access time of a memory may require substantially more time than an access time of an on-chip queue. Therefore, a load-store queue may be included on-chip of a microprocessor and it may hold data values of uncommitted load and store instructions.
When a load instruction is dispatched in order to be executed, its address may be compared to all addresses in the store queue or buffer. This queue holds uncommitted store instructions. The data value that the load instruction needs may be in the store queue, rather than in a cache or other memory. Multiple entries in the store queue may have a matching address for the load instruction due to out-of-order execution. In order to know which entry in the store queue has the needed forwarding data value, a priority encoder may be used to identify the youngest of the older (program order) uncommitted store entries.
The store to load forwarding (STLF) path may be one of the critical timing paths on a processor core. The addition of a priority encoder in the STLF path increases the timing requirement for this path and ultimately, may set a limit on the maximum operating frequency of the processor core. Computing performance may then begin to suffer.
In view of the above, an efficient method for achieving data forwarding from a store instruction to a load instruction during out-of-order execution is desired.