1. Field of the Invention
This invention relates to microprocessors, and more particularly, to the management of resource allocation of threads for efficient execution of instructions.
2. Description of the Relevant Art
Modern processor cores, or processors, are pipelined in order to increase throughput of instructions per clock cycle. However, the throughput may still be reduced due to certain events. One event is a stall, which may be caused by a branch misprediction, a cache miss, data dependency, or other, wherein no useful work may be performed for a particular instruction during a clock cycle. Another event may be that resources, such as circuitry for an arithmetic logic unit (ALU) or for a load-store unit (LSU), may not be used for one or more clock cycles due to the type of instruction(s) being executed in a particular clock cycle.
Different techniques are used to fill these unproductive cycles in a pipeline with useful work. Some examples include loop unrolling of instructions by a compiler, branch prediction mechanisms within a core and out-of-order execution within a core. An operating system may divide a software application into processes and further divide processes into threads. A thread, or strand, is a sequence of instructions that may share memory and other resources with other threads and may execute in parallel with other threads. A processor core may be constructed to execute more than one thread per clock cycle in order to increase efficient use of the hardware resources and reduce the effect of stalls on overall throughput. A microprocessor may include multiple processor cores to further increase parallel execution of multiple instructions per clock cycle.
However, an operating system (O.S.) may place a thread, or strand, in a parked state. A parked state is an idle state for the strand where no instructions for that particular strand are assigned to the hardware resources of the strand. This may occur when there is insufficient work and the strand enters an idle loop in the kernel. Within a core of multiple strands, any shared resources among strands are now only used by the strands that are not parked. The only time the shared resources are completely idle are when all the strands within the core are parked.
A problem may arise with resource management within a core when one or more strands are parked. The instruction fetch and dispatch mechanisms may not be able to sustain a good instruction stream rate and hence later stages of the pipeline will have no or limited set of instructions to work on. Therefore, the throughput, or instructions per clock cycle (IPC), may not be high as it can be. This may be due to the complexity and latency of the fetch and dispatch mechanisms. If a microprocessor is designed to execute many strands by incorporating multiple cores, there may be larger fetch latencies due to circuit constraints, such as routing distances and added stages of logic. A core with parked strands and an active strand may not have all of its resources efficiently used by the active strand. The active strand may not receive a steady sufficient supply of instructions due to the above reasons. Also, a multi-cycle latency between fetches for a particular active strand may be uncovered as no useful work will be performed by the core as the other strands are parked.
In view of the above, an efficient method for the management of resource allocation of threads for efficient execution of instructions is desired.