1. Field of the Invention
The present invention is directed to processing instructions in one or more threads of execution.
2. Description of the Related Art
Processors execute processor instructions using execution pipelines, dispatch units, instruction decode units and other units. Processor instructions are retrieved from a memory device by an instruction fetch unit. A dispatch unit fetches one or more processor instructions from one or more threads and dispatches the instruction(s) to one or more execution pipelines having an open execution slot. An execution pipeline receives and executes one or more instructions. Once another execution slot is available in one of the execution pipelines, additional processor instructions are fetched for a thread and dispatched to the appropriate execution pipeline.
Once dispatched, each processor instruction is processed by one or more execution pipelines until the instruction stalls, is flushed or completes. In some systems, when an instruction stalls within an execution pipeline, all instructions waiting to be processed in that execution pipeline are delayed. Thus, although only one thread is waiting for the result of an instruction that caused the stall, other threads having instructions within the stalled execution pipeline are effectively stalled as well. Additionally, in some systems, when one execution pipeline is stalled, other execution pipelines stall as well.
Some stall conditions can be detected when the processor instructions are compiled. For example, an execution pipeline may require ten processing cycles, or clock cycles, to perform a multiply operation. A first instruction for thread A may request a multiply operation. The second instruction for thread A may use the result of the multiply operation. In this scenario, the second instruction will stall for nine processing cycles until the result of the first instruction is available.
The stall condition discussed above can be detected at compile time. If detected, a compiler may avoid the stall by scheduling instructions collectively requiring at least nine processing cycles after the first instruction. If other instructions cannot be scheduled after the first instruction, the compiler can insert up to nine no operation instructions (nops) between the first two instructions. The nops are dispatched and executed by an execution pipeline similar to any other instruction. The disadvantages to using nops are that the instruction image becomes larger and requires more memory and the nops take valuable execution slots in execution pipelines. Dispatching a nop for an execution slot delays execution of other processor instructions which could have been dispatched for the execution slot. This includes instructions which could have been dispatched for other threads. Additionally, stall conditions caused by processor instructions residing in different threads are not detected by compiler programs and instruction processing systems.
Thus, there is a need to better handle potential stall conditions.