1. Field of the Invention
This invention is related to the field of processors and, more particularly, to store to load forward mechanisms within processors.
2. Description of the Related Art
Processors often include store queues to buffer store memory operations which have been executed but which are still speculative. The store memory operations may be held in the store queue until they are retired. Subsequent to retirement, the store memory operations may be committed to the cache and/or memory. As used herein, a memory operation is an operation specifying a transfer of data between a processor and a main memory (although the transfer may be completed in cache). Load memory operations specify a transfer of data from memory to the processor, and store memory operations specify a transfer of data from the processor to memory. Memory operations may be an implicit part of an instruction which includes a memory operation, or may be explicit load/store instructions. Load memory operations may be more succinctly referred to herein as xe2x80x9cloadsxe2x80x9d. Similarly, store memory operations may be more succinctly referred to as xe2x80x9cstoresxe2x80x9d.
While executing stores speculatively and queueing them in the store queue may allow for increased performance (by removing the stores from the instruction execution pipeline and allowing other, subsequent instructions to execute), subsequent loads may access the memory locations updated by the stores in the store queue. While processor performance is not necessarily directly affected by having stores queued in the store queue, performance may be affected if subsequent loads are delayed due to accessing memory locations updated by stores in the store queue. Often, store queues are designed to forward data stored therein if a load hits the store queue. As used herein, a store queue entry storing a store memory operation is referred to as being xe2x80x9chitxe2x80x9d by a load memory operation if at least one byte updated by the store memory operation is accessed by the load memory operation.
To further increase performance, it is desirable to execute younger loads out of order with respect to older stores. The younger loads may often have no dependency on the older stores, and thus need not await the execution of the older stores. Since the loads provide operands for execution of dependent instructions, executing the loads allows for lo still other instructions to be executed. However, merely detecting hits in the store queue as loads are executing may not lead to correct program execution if younger loads are allowed to execute out of order with respect to older stores, since certain older stores may not have executed yet (and thus the store addresses of those stores may not be known and dependencies of the loads on the certain older stores may not be detectable as the loads are executed). Accordingly, hardware to detect scenarios in which a younger load executes prior to an older store on which that younger load is dependent may be required, and then corrective action may be taken in response to the detection. For example, instructions may be purged and refetched or reexecuted in some other suitable fashion. As used herein, a load is xe2x80x9cdependentxe2x80x9d on a store if the store updates at least one byte of memory accessed by the load, is older than the load, and is younger than any other stores updating that byte. Unfortunately, executing the load out of order improperly and the subsequent corrective actions to achieve correct execution may reduce performance.
It is noted that loads, stores, and other instruction operations may be referred to herein as being older or younger than other instruction operations. A first instruction is older than a second instruction if the first instruction precedes the second instruction in program order (i.e. the order of the instructions in the program being executed). A first instruction is younger than a second instruction if the first instruction is subsequent to the second instruction in program order.
The problems outlined above are in large part solved by a processor as described herein. The processor generally may schedule and/or execute younger loads ahead of older stores. Additionally, the processor may detect and take corrective action for scenarios in which an older store interferes with the execution of the younger load. The processor employs a store to load forward (STLF) predictor which may indicate, for dispatching loads, a dependency on a store. The dependency is indicated for a store which, during a previous execution, interfered with the execution of the load. Since a dependency is indicated on the store, the load is prevented from scheduling and/or executing prior to the store. Performance may be increased due to the decreased interference between loads and stores.
The STLF predictor is trained with information for a particular load and store in response to executing the load and store and detecting the interference. Additionally, the STLF predictor may be untrained (e.g. information for a particular load and store may be deleted) if a load is indicated by the STLF predictor as dependent upon a particular store and the dependency does not actually occur. For example, in one embodiment, the STLF predictor is untrained if the load is indicated as dependent upon the particular store but store data is not forwarded from a store queue within the processor when the load executes.
In one implementation, the STLF predictor records at least a portion of the PC of a store which interferes with the load in a first table indexed by the load PC. A second table maintains a corresponding portion of the store PCs of recently dispatched stores, along with tags identifying the recently dispatched stores. The PC of a dispatching load is used to select a store PC from the first table. The selected store PC is compared to the PCs stored in the second table. If a match is detected, the corresponding tag is read from the second table and used to indicate a dependency for the load.
In another implementation, the STLF predictor records a difference between the tags assigned to a load and a store which interferes with the load in a first table indexed by the load PC. The PC of the dispatching load is used to select a difference from the table, and the difference is added to the tag assigned to the load. Accordingly, a tag of the store may be generated and a dependency of the load on the store may be indicated.
Broadly speaking, a store to load forwarding (STLF) predictor is contemplated. The STLF predictor comprises a dependency table and a dependency circuit coupled to the dependency table. The dependency table is configured to store a first indication of a first store memory operation which, during a previous execution, interfered with a first load memory operation. The dependency table is configured to output the first indication and a valid indication indicative of a validity of the first indication responsive to receiving a second indication of the first load memory operation. The dependency circuit is configured to indicate a dependency of the first load memory operation on the first store memory operation responsive to the valid indication.
Additionally, an STLF predictor is contemplated having a dependency table, a second table, and a dependency circuit. The dependency table is configured to store at least a portion of a first store program counter address (PC) corresponding to a first store memory operation which, during a previous execution, interfered with a first load memory operation. The dependency table is configured to output the portion of the first store PC and a valid indication indicative of a validity of the portion of the first store PC responsive to receiving at least a portion of a load PC corresponding to the first load memory operation. Coupled to receive the portion of the first store PC from the dependency table, the second table is configured to store corresponding portions of store PCs corresponding to N most recently dispatched store memory operations and tags identifying the N most recently dispatched store memory operations. The second table is configured to compare the portion of the first store PC to the corresponding portions of the store PCs and to generate hit signals in response to the compare. Coupled to the dependency table and to the second table, the dependency circuit is configured to indicate a dependency for the first load memory operation responsive to the valid indication and the hit signals.
Still further, a method is contemplated. A load memory operation is executed. An interference of the load memory operation by a store memory operation is detected. A dependency table within a store to load forward (STLF) predictor is updated with an indication of the store memory operation responsive to the detecting of the interference.
Moreover, a processor is contemplated. The processor comprises a scheduler and a STLF predictor including a dependency table and a dependency circuit. The dependency table is configured to store a first indication of a first store memory operation which, during a previous execution, interfered with a first load memory operation. The dependency table is configured to output the first indication and a valid indication indicative of a validity of the first indication responsive to receiving a second indication of the first load memory operation. Coupled to the dependency table, the dependency circuit is configured to signal a dependency of the first load memory operation on the first store memory operation responsive to the valid indication. Coupled to receive an indication of the dependency, the scheduler is configured to inhibit scheduling of the first load memory operation prior to scheduling the first store memory operation responsive to the indication of the dependency.