1. Field of the Invention
This invention is related to the field of processors and, more particularly, to the handling of store queue entry assignment in processors.
2. Description of the Related Art
Processors often include store queues to buffer store memory operations which have been executed but which are still speculative and/or have been retired but not yet committed to memory. The store memory operations may be held in the store queue until they are retired. Subsequent to retirement, the store memory operations may be committed to the cache and/or memory. As used herein, a memory operation is an operation specifying a transfer of data between a processor and a main memory (although the transfer may be completed in cache). Load memory operations specify a transfer of data from memory to the processor, and store memory operations specify a transfer of data from the processor to memory. Memory operations may be an implicit part of an instruction which includes a memory operation, or may be explicit load/store instructions. Load memory operations may be more succinctly referred to herein as xe2x80x9cloadsxe2x80x9d. Similarly, store memory operations may be more succinctly referred to as xe2x80x9cstoresxe2x80x9d.
While executing stores speculatively and queueing them in the store queue may allow for increased performance (by removing the stores from the instruction execution pipeline and allowing other, subsequent instructions to execute), subsequent loads may access the memory locations updated by the stores in the store queue. While processor performance is not necessarily directly affected by having stores queued in the store queue, performance may be affected if subsequent loads are delayed due to accessing memory locations updated by stores in the store queue. Furthermore, if a processor allows memory operations to be executed out of order, it is difficult to determine which of the stores in the store queue are older than a load (and hence the load may read bytes updated by the store) and which of the stores are younger than the load (and hence the load should not read the bytes updated by the store since it is prior to the store in program order). As used herein, a store queue entry storing a store memory operation is referred to as being xe2x80x9chitxe2x80x9d by a load memory operation if at least one byte updated by the store memory operation is accessed by the load memory operation.
Additionally, processors have generally been limited to executing stores in program order with respect to other stores. Generally, stores are presented in order to the memory system (e.g. to preserve memory consistency in multiprocessor configurations). Additionally, a processor must be able to determine the order of stores executed by that processor to allow for correct forwarding of store data to dependent loads. Another reason for the in-order execution limitation for stores is that the store queue is finite. A deadlock condition could result if the store queue is filled with speculatively executed stores and an older store is not yet executed. Since the speculatively executed stores cannot be committed (and removed from the store queue) until the older store is committed, and since the older store cannot be executed because the store queue is full, stores cannot be completed and a deadlock results. A method for executing stores out of order with respect to other stores which does not deadlock is therefore desired.
It is noted that loads, stores, and other instructions or instruction operations may be referred to herein as being older or younger than other instructions or instruction operations. A first instruction is older than a second instruction if the first instruction precedes the second instruction in program order (i.e. the order of the instructions in the program being executed). A first instruction is younger than a second instruction if the first instruction is subsequent to the second instruction in program order.
The problems outlined above are in large part solved by a processor as described herein. The processor includes a store queue and a store queue number assignment circuit. The store queue number assignment circuit assigns store queue numbers to stores, and operates upon instruction operations prior to the instruction operations reaching a point in the pipeline of the processor at which out of order instruction processing begins. Thus, store queue entries may be reserved for stores according to the program order of the stores. Stores may be executable out of order, since store queue entries are provided for the stores.
Additionally, in one embodiment, the store queue number identifying the youngest store represented in the store queue may be assigned to loads. In this manner, loads may determine which stores in the store queue are older or younger than the load based on relative position within the store queue. Checking for store queue hits may be qualified with the entries between the head of the store queue and the entry indicated by the load""s store queue number. In one particular embodiment, the store queue number may include an additional xe2x80x9ctogglexe2x80x9d bit which is toggled each time the assignment of store queue numbers reaches the maximum store queue entry and wraps to zero. If the toggle bit of the store in the store queue entry identified by the load""s store queue number differs from the toggle bit of the load""s store queue number, than the store queue entry has been reassigned to a store younger than the load (subsequent to the retirement and commitment of the store previously occupying that store queue entry). Thus, the load is older than the stores in the store queue and store queue hits are not detected.
Broadly speaking, a processor is contemplated, comprising a store queue and a store queue number assignment circuit. The store queue includes a plurality of store queue entries, wherein each of the plurality of store queue entries is configured to store address and data information corresponding to a store memory operation. The store queue number assignment circuit is coupled to receive a first store memory operation and to assign a first store queue number indicative of a first one of the plurality of store queue entries to the first store memory operation. The store queue number assignment circuit operable at a first pipeline stage of a pipeline employed by the processor The first pipeline stage is prior to commencement of out of order instruction processing within the pipeline. Additionally, a computer system is contemplated including the processor and an input/output (I/O) device configured to communicate between the computer system and another computer system to which the I/O device is couplable.
Additionally, a method is contemplated. A store queue number is assigned to a store memory operation prior to the store memory operation reaching a pipeline stage at which out of order processing commences. The store memory operation is executed. Address and data information corresponding to the store memory operation is stored into a store queue entry of a store queue, the store queue entry identified by the store queue number.