Modern information handling systems (IHSs) often employ processors that include multiple stages that together form a pipeline. For example, a pipelined processor may include a fetch unit, a decoder, an instruction queue, a number of execution units, and a completion or writeback unit. The fetch unit fetches instructions from a memory cache or system memory to provide an instruction stream. The decoder decodes the fetched instructions into opcodes and operands. An issue unit or dispatch unit sends decoded instructions to appropriate execution units for execution. A completion or writeback unit writes the completed results back to an appropriate processor register or memory. While one stage of the pipelined processor performs a task on one instruction, another stage performs a different task on another instruction. For example, the fetch unit fetches a first instruction from an instruction cache. Next, while the decoder decodes the fetched first instruction, the fetch unit fetches another instruction from the instruction cache. Breaking instruction handling into separate tasks or stages to form a pipeline in this manner may significantly increase processor performance.
Together, the fetch unit, the decoder, the issue unit and the execution units may form a processor core. A processor may include multiple processor cores to increase performance. The processor cores of more advanced processors may employ an issue unit that includes an issue queue to enable out-of-order execution of instructions. The issue queue dispatches instructions that exhibit no dependencies to execution units for execution. When an instruction exhibits a dependency, it remains in the issue queue until resolution of the dependency. The issue queue dispatches younger instructions without dependencies to the execution units while an instruction with a dependency remains in the issue queue.
The execution units of a processor core typically include a load/store execution unit (LSU), an integer execution unit and a floating point execution unit. The load/store execution unit (LSU) handles load/store requests from the other execution units. A core interface unit may interface the load/store unit (LSU) of a processor core to a memory cache and system memory to enable the LSU to access memory data. The LSU, acting as a master, may issue requests to fetch data from the cache memory or system memory. Some core interface units may employ an age queue to track the age of each load/store request. It is generally desirable that the processor service older memory requests ahead of younger memory requests. Thus, tracking the age of the requests with an age queue is helpful.
One conventional type of age queue is a first-in first-out (FIFO) age queue. Unfortunately, while relatively simple to implement, a delay in the servicing of any request in the queue may block the servicing of any requests that follow in the FIFO age queue. Another conventional approach is to implement an age queue that allows for out-of-order load/store requests. In such as approach, the age queue tracks the age of each load/store request. However, if a particular load/store request is not immediately serviceable, then the core interface unit services the next oldest load/store as determined by age information stored in the age queue. One problem with this approach is that holes form in the age queue when load/store requests complete out-of-order. The age queue may require a sophisticated shifter to handle these holes. When a load/store request enters the age queue, it proceeds directly to the end of the queue (i.e. the head of the queue) or as deep into the queue as possible without being blocked by an entry corresponding to another load/store request. As holes form in the age queue, age queue control logic may need to shift the pending requests down to plug the holes. In a system where multiple new requests can enter the queue and multiple serviced requests can complete out-of-order in the same cycle, one may expect the complexity of the shifter control to grow significantly to manage the adding and removal of queue entries. This approach becomes very expensive to implement at high frequencies and consumes substantial amount of power. This approach may also become less practical as the queue depth increases.
What is needed is a processor apparatus and methodology that addresses the age queue request handling problems above.