With every new generation of microprocessors, instruction window sizes are increasing. The term “instruction window” describes the set of processor instructions in various stages of execution at any given time. As newer generations of processors continue to increase the number of parallel execution units and the depth of pipelines, the number of processor instructions in the instruction window at any given time continues to increase.
Some processors with parallel execution units can execute many processor instructions in parallel. This improves program execution speed. Instructions enter the instruction window when they are read from the software program in “program order.” “Program order” is the sequential order of the processor instructions in the program. When in the instruction window, different instructions can be scheduled to be executed by different execution units. This can lead to instructions being executed out of program order, or “out-of-order.” The execution order of some processor instructions can be important. Examples include “load” instructions that read the contents of memory locations, and “store” instructions that update the contents of memory locations. For example, if a load instruction and a store instruction that specify the same memory location exist in a program with the store instruction preceding the load instruction in program order, the load instruction is said to be “dependent” on the store instruction. If the dependent load instruction accesses the memory location before the store instruction, the load instruction will read the memory location contents prior to the update by the store instruction, and an error will result. When a load instruction is dependent on a store instruction, the program order of these two instructions should be respected by the processor executing them.
FIG. 1 shows a prior art store queue. Store queue 100 is a fully associative queue that includes information from every store instruction that is in the instruction window. When a store instruction is read from the program, information from the store instruction is deposited in store queue 100, where it stays for as long as the store instruction is in the instruction window. Store queue 100 includes entries 130, each of which including a store address (STA) field 110 and store data (STD) field 120. When a store instruction is encountered in a program, an entry in store queue 100 is allocated, and to the extent that information is available to fill in STA field 110 and STD field 120, they are filled in. When a store instruction is completed, or “retired,” the corresponding entry is removed from store queue 100.
When a load instruction is encountered in a program, store queue 100 is searched to see if store queue 100 includes a store instruction upon which the load instruction depends. If one of entries 130 has an STA field 110 that matches the address corresponding to the load instruction, then the load instruction is dependent. If no such store instruction is found in store queue 100, the load instruction can be executed immediately without causing an erroneous out-of-order condition. This is called “memory disambiguation.” If a store instruction upon which the load instruction depends is found, then execution of the load instruction can be delayed, or the load instruction can be satisfied by reading the data value from store queue 100 rather than from the memory location. This is called “store data forwarding.”
Store queue 100 is a fully associative queue that is completely searched each time a load instruction is encountered in a program. The search of store queue 100 takes time. For each load instruction encountered in a program, time is spent in a sequential search of store queue 100 to find any store instructions upon which the load instruction depends, and to find data to satisfy the load instruction with a store data forwarding operation. As instruction windows increase in size, store queue 100 and the associated search time also increase in size. When store queue 100 is very large, the search time can become so large as to cause performance problems.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an alternate method and apparatus for providing memory disambiguation.