1. Field of the Invention
The present invention generally relates to allocating resources within a microprocessor. More particularly, the invention relates to managing load and store queues in a simultaneous multi-threaded processor. Still more particularly, the invention relates to dynamically allocating entries in load and store queues to more efficiently use the resources contained within the processor.
2. Background Information
All computers have a device called a “microprocessor.” A microprocessor, or simply “processor,” comprises the logic, typically a semiconductor device, which executes software. Microprocessors fetch software instructions from memory and execute them. Each instruction generally undergoes several stages of processing. For example, the instruction is fetched and decoded to determine the type of instruction (load, store, add, multiply, etc.). Then, the instruction is scheduled, executed and eventually retired. Each stage of processing may require multiple clock cycles. It has been recognized that the next instruction to be executed by a processor can be fetched and entered into the processor's pipeline before the previous instruction is retired. Thus, some processors are designed with pipelined architectures to permit multiple instructions to be at various stages of processing at any one point in time. An instruction that is in the pipeline, but not yet retired, is said to be “in flight.”
A microprocessor includes a number of internal resources that it uses to process and execute the instructions. The preferred embodiments of the invention described below are directed to utilizing those resources more efficiently. More specifically, the preferred embodiments are directed to techniques for managing load and store queues in the processor. A load queue is a buffer into which load instructions are stored pending retirement. A load instruction causes data to be retrieved from memory. A store queue is a buffer into which store instructions are kept pending until their impact can be committed to machine state. A store instruction causes data to be written to memory. Typically, store and load queues have a limited number of entries into which store and load instructions can be written. The number of entries typically is less than the total number of store and load instructions that may be in-flight at any given time.
Some processors are referred to as simultaneous “multi-threaded” processors which means they can execute in multiple threads of software simultaneously. Some processors include thread processing units (“TPUs”). A TPU is hardware in the processor that creates the capability of running a process by holding the state of the running process, primarily its program counter (“PC”) and registers. A processor that can hold enough information state for four TPUs, for example, can run four processes on the same set of functional units, instruction queue, caches, etc.
In many previous processor designs, entries in the load and store queues were pre-allocated to each of the TPUs. Although generally satisfactory, the following problem arises. On one hand, a TPU will operate sub-optimally if the TPU actually needs more store/load queue entries than it was allocated. On the other hand, a TPU may not need all of the load/store queue entries it was allocated. Accordingly, there are situations in which a scheme which pre-allocates processor resources will be operating in a non-optimal fashion. As such, a scheme is needed that allocates the load and store queue entries to ensure more efficient use of the queues.