When performing load and store instructions, typical prior art microprocessors rely on a searchable queue containing content-addressable memory (CAM) logic, to enforce ordering among memory operations and for forwarding data corresponding to store instructions to load instructions while high-latency instructions are accessing data from memory (“pending”). High latency instructions can result from the instruction having to resort to a memory structure having a relatively slow access time, such as dynamic random access memory (DRAM), if the corresponding data is not present in a relatively faster memory structure, such as a cache memory. The lack of the desired data within a particular memory structure is commonly referred to as a “miss”, while the presence of the data within a memory structure is commonly referred to as a “hit”.
FIG. 1 illustrates a prior art processor architecture including logic for servicing instructions that are independent of a high-latency instruction. The prior art architecture of Figure can service instructions continuously without stalling the processor, including instructions that are independent of long-latency instructions, such as loads that are accessing data from a relatively slow memory source (e.g., DRAM). In particular, instructions decoded by the instruction decoder and allocated registers by the allocate and register renamer are stored as micro-operations (uops) in uop queues, from which they are scheduled for execution by the functional units and committed to the register file.
The prior art architecture of FIG. 1 allows miss-independent instructions to use register file and scheduler resources by forcing long-latency instructions and those instructions dependent upon the long-latency instructions to relent scheduling and register file resources until the miss can be serviced. This allows miss-independent instructions to execute and complete without being blocked by long-latency instructions or their dependents.
Instructions dependent on the long-latency instruction, in FIG. 1, are temporarily stored in a wait buffer, while independent instructions are serviced during the pendency of the long-latency instruction. However, in order to ensure correct memory ordering, all store instructions concurrently in process (“in flight”) must be stored during the pendency of the long-latency instruction, typically requiring large store queues (e.g., L1 and L2 store queues). These store queues can grow with increased instruction processing.
Moreover, in order to search these store queues, extra logic, such as CAM logic, may be necessary. Particularly, load operations searching for a corresponding store operation having data to satisfy the load operations, typically search a relatively large store queue using CAM logic that increases in size with the size of the queue.
Searching a large store queue that has CAM logic can potentially increase cycle time or increase the number of cycles it takes to access the store queue. Further, using searchable store queues to forward store data to the proper load instruction can become increasingly difficult to accommodate as the number of in-flight instructions increase during processing of a long-latency instruction, such as a load servicing a miss. Moreover, search logic, such as CAM logic, typically associated with searchable store queues can require excess power, die real estate, and processing cycles in order to satisfy independent load operations during other pending long-latency operations.