The present invention relates generally to processors having large instruction windows, and more specifically to memory disambiguation in processors having large instruction windows.
With every new generation of microprocessors, instruction window sizes are increasing. The term xe2x80x9cinstruction windowxe2x80x9d describes the set of processor instructions in various stages of execution at any given time. As newer generations of processors continue to increase the number of parallel execution units and the depth of pipelines, the number of processor instructions in the instruction window at any given time continues to increase.
Some processors with parallel execution units can execute many processor instructions in parallel. This improves program execution speed. Instructions enter the instruction window when they are read from the software program in xe2x80x9cprogram order.xe2x80x9d xe2x80x9cProgram orderxe2x80x9d is the sequential order of the processor instructions in the program. When in the instruction window, different instructions can be scheduled to be executed by different execution units. This can lead to instructions being executed out of program order, or xe2x80x9cout-of-order.xe2x80x9d
The execution order of some processor instructions can be important. Examples include xe2x80x9cloadxe2x80x9d instructions that read the contents of memory locations, and xe2x80x9cstorexe2x80x9d instructions that update the contents of memory locations. For example, if a load instruction and a store instruction that specify the same memory location exist in a program with the store instruction preceding the load instruction in program order, the load instruction is said to be xe2x80x9cdependentxe2x80x9d on the store instruction. If the dependent load instruction accesses the memory location before the store instruction, the load instruction will read the memory location contents prior to the update by the store instruction, and an error will result. When a load instruction is dependent on a store instruction, the program order of these two instructions should be respected by the processor executing them.
FIG. 1 shows a prior art store queue. Store queue 100 is a fully associative queue that includes information from every store instruction that is in the instruction window. When a store instruction is read from the program, information from the store instruction is deposited in store queue 100, where it stays for as long as the store instruction is in the instruction window. Store queue 100 includes entries 130, each of which including a store address (STA) field 110 and store data (STD) field 120. When a store instruction is encountered in a program, an entry in store queue 100 is allocated, and to the extent that information is available to fill in STA field 110 and STD field 120, they are filled in. When a store instruction is completed, or xe2x80x9cretired,xe2x80x9d the corresponding entry is removed from store queue 100.
When a load instruction is encountered in a program, store queue 100 is searched to see if store queue 100 includes a store instruction upon which the load instruction depends. If one of entries 130 has an STA field 110 that matches the address corresponding to the load instruction, then the load instruction is dependent. If no such store instruction is found in store queue 100, the load instruction can be executed immediately without causing an erroneous out-of-order condition. This is called xe2x80x9cmemory disambiguation.xe2x80x9d If a store instruction upon which the load instruction depends is found, then execution of the load instruction can be delayed, or the load instruction can be satisfied by reading the data value from store queue 100 rather than from the memory location. This is called xe2x80x9cstore data forwarding.xe2x80x9d Store queue 100 is a fully associative queue that is completely searched each time a load instruction is encountered in a program. The search of store queue 100 takes time. For each load instruction encountered in a program, time is spent in a sequential search of store queue 100 to find any store instructions upon which the load instruction depends, and to find data to satisfy the load instruction with a store data forwarding operation. As instruction windows increase in size, store queue 100 and the associated search time also increase in size. When store queue 100 is very large, the search time can become so large as to cause performance problems.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an alternate method and apparatus for providing memory disambiguation.
In one embodiment, a method of executing instructions in an out-of-order processor includes receiving a decoded instruction such as a store instruction that is configured to store a data value to a memory address. The instruction is inserted in a queue, and the instruction is also assigned to an entry in a set-associative buffer.
In another embodiment, a method of executing instructions includes issuing and removing a store instruction from a queue to a memory such that a data value associated with the store instruction is stored at a memory location specified by a memory address associated with the store instruction. The method also includes searching a set-associative buffer for an entry corresponding to the store instruction, and when the entry corresponding to the store instruction is found, removing the entry from the set-associative buffer.
In another embodiment, a method of executing instructions in an out-of-order processor includes receiving a first decoded instruction that is configured to load a data value from a memory address, and allocating an entry in a queue for the instruction. The method also includes searching a set of a set-associative buffer for a second instruction upon which the first instruction depends.
In another embodiment, a memory disambiguation apparatus includes a queue configured to hold all of the store instructions that are in an instruction window, and a set-associative buffer configured to hold a subset of the store instructions that are in the instruction window. In this embodiment, the set-associative buffer is organized in multiple sets, and each of the store instructions in the set-associative buffer has resolved memory addresses. Each of the multiple sets is configured to be searched for store instructions upon which a load instruction depends.
In another embodiment, a memory disambiguation apparatus includes a set-associative buffer arranged in sets, where each set includes buffer entries, and each of the buffer entries includes a tag field and a data field. In this embodiment, each of the buffer entries corresponds to a separate store instruction in an instruction window. Also included in this embodiment are overflow indicators, where each overflow indicator has a one-to-many relationship with the buffer entries in the set-associative buffer.