Field of the Invention
The present invention relates generally to processors, and in particular to methods and mechanisms for recording load and store operations in a load-store dependency predictor.
Description of the Related Art
Superscalar processors attempt to achieve high performance by issuing and executing multiple instructions per clock cycle and by employing the highest possible clock frequency consistent with the design. One way to increase the number of instructions executed per clock cycle is by performing out of order execution. In out of order execution, instructions may be executed in a different order than that specified in the program sequence (or “program order”).
Some processors may be as aggressive as possible in scheduling instructions out of order and/or speculatively in an attempt to maximize the performance gain realized. For example, it may be desirable to schedule load memory operations prior to older store memory operations, since load memory operations more typically have dependent instructions. However, in some cases, a load memory operation may depend on an older store memory operation (e.g., the store memory operation updates at least one byte accessed by the load memory operation). In such cases, the load memory operation is incorrectly executed if executed prior to the store memory operation. If a load memory operation is executed prior to an older store memory operation on which the load depends, the processor may need to be flushed and redirected, which will degrade processor performance.
An operation is older than another operation if the operation is prior to the other operation in program order. An operation is younger than another operation if it is subsequent to the other operation in program order. Similarly, operations may be indicated as being prior to or subsequent to other operations, or may be referred to as previous operations, preceding operations, subsequent operations, etc. Such references may refer to the program order of the operations. Furthermore, a “load memory operation” or “load operation” may refer to a transfer of data from memory or cache to a processor, and a “store memory operation” or “store operation” may refer to a transfer of data from a processor to memory or cache. “Load operations” and “store operations” may be more succinctly referred to herein as “loads” and “stores”, respectively.
A table may be used to store load-store pairs that have caused previous ordering violations. Mechanisms to identify loads and stores in these tables often rely on the respective program counter (PC) value. Searching for a full PC value in a fully associative structure may be costly in terms of the power used, and so alternatively, a portion of the PC may be used to identify loads and stores. However, if only a portion of a PC is utilized, some amount of aliasing between unique PCs may occur. Furthermore, some load instructions are decoded into a sequence of multiple load operations and some store instructions are decoded into a sequence of multiple store operations. For these instructions, when an ordering violation occurs, it is difficult to differentiate between the multiple load and store operations to determine which specific load operation was dependent on which specific store operation.