Many modern microprocessors allow instructions to execute out-of-order. This improves performance because it allows more instructions to complete in the same amount of time by efficiently distributing instructions among the computing resources of the microprocessor.
Problems may occur, however, when executing load and store instructions out-of-order. When a load instruction issues before an older store instruction referencing the same address, the load may retrieve an incorrect value because the store data the load should use is not yet present at the address. When hardware detects this condition, it performs a squash in which results from the load instruction and subsequent instructions are ignored and the instructions must be re-executed (replayed). Such hardware recovery degrades processor performance.
Memory reference tagging stores have been proposed in which information regarding the squash is used to predict subsequent memory instruction collisions, and consequently to prevent reordering of certain instructions as appropriate. This general method is described in U.S. Pat. No. 5,619,662 (Steely), xe2x80x9cMemory Reference Taggingxe2x80x9d, dated Apr. 8, 1997 (hereinafter ""662 Patent), and incorporated herein by this reference in the entirety.
The ""662 Patent describes a technique of detecting dependent out-of-order load and store instructions using a write buffer, which keeps track of executed load and store instructions until it is determined they have executed in their proper order. Four different approaches to handling such an out-of-order detection are described:
In the first approach, part of the referenced (or target) memory address is used as a tag to be associated with each of the out-of-order instructions. If these instructions later appear again in the instruction queue, the fact that they have identical tags will cause them to be issued in order.
The second approach uses an assigned xe2x80x9cproblem numberxe2x80x9d as a tag. Two instructions referencing the same memory address and with the same problem number are not re-ordered.
The third approach simply associates a tag bit with an instruction to indicate that other memory reference instructions should not be re-ordered around the tagged instruction.
Finally, the fourth approach turns off reordering for some number of instructions when entering a subroutine.
U.S. Pat. No. 5,615,350 (Hesson), xe2x80x9cApparatus to Dynamically Control the Out-of-Order Execution of Load-Store Instructions in a Processor Capable of Dispatching, Issuing and Executing Multiple Instructions in a Single Processor Cycle,xe2x80x9d dated Mar. 25, 1997 describes a xe2x80x9cstore barrierxe2x80x9d cache that is used to predict, upon fetching of a store instruction, whether execution of the store instruction will lead to an out-of-order violation. Upon such a prediction, execution of all load instructions is delayed until the store instruction executes.
Another solution to the problem of reordering dependent load instructions is to remember recently squashed loads, and to force those loads to wait for all prior stores the next time they are fetched.
Each of the above approaches suffers a performance degradation because the load is delayed for many more stores then is necessary. Squashes are expensive in that they consume valuable time re-executing code and therefore slow the net instruction execution in a processor. The techniques described above attempt to predict out-of-order load/store violations to avoid such costly out-of-ordering, but these techniques over-compensate by forcing a load to wait unnecessarily.
The present invention attempts to detect load/store dependencies early, and avoid even a first squash in many cases.
Accordingly, a method of reducing load/store execution order violations in an out-of-order processor comprises determining whether a source address of a load instruction is the same as a destination address of a store instruction on which execution the load instruction depends. If they are the same, then execution of the load instruction is delayed until execution of the store instruction. This determination can be made, for example, during a fetch cycle, by further determining whether any intervening instructions modify the contents of a register referenced by both the load instruction and the store instruction, where source and destination addresses are determined by offsets to a value contained within the register.
In a preferred embodiment, named or virtual registers are mapped to physical registers, in the mapping stage of an instruction pipeline. Each unmodified instance of a virtual register is mapped to a common physical register. Determining whether a source address of a load instruction is the same as a destination address of a store instruction is reduced to determining whether the physical registers used by the store and load instructions are the same, and whether the offsets used in the instructions are the same. This determination is performed during the mapping stage.
A table is provided, which has entries corresponding at least to instructions on which the load instruction may depend, and preferably corresponding to entries in the instruction queue. In each table entry corresponding to a store instruction, the store instruction""s destination address offset and physical register reference are saved. Determining whether physical registers used by the store and load instructions are the same comprises comparing the load instruction""s source address offset and physical reference with each of the table entries corresponding to store instructions.
Furthermore, a matrix is provided which has a row corresponding at least to each instruction on which the load instruction may depend, preferably to each entry in the instruction queue. In addition, each entry has an indicator for each instruction having an entry in the table. Thus, for an n-entry instruction queue, the matrix forms a nxc3x97n matrix, each row corresponding to an instruction in the instruction queue which may depend upon another instruction, and each column corresponding to an instruction in the instruction queue on which other instructions may depend. Upon determining that a load instruction is dependent upon an unexecuted store instruction, the indicator corresponding to the store instruction in the entry corresponding to the load instruction is xe2x80x9cmarked,xe2x80x9d while an indicator is xe2x80x9cunmarkedxe2x80x9d when the corresponding store instruction issues. Execution of any load instruction is delayed while any indicator in the load instruction""s corresponding entry is marked. When a store instruction executes, all indicators in the column corresponding to the store instruction are unmarked.