The present invention relates to processors and methods of operating processors. The invention has particular application in parallel pipelined processors, such as very long instruction word (VLIW) processors.
High performance processors use a technique known as pipelining to increase the rate at which instructions can be processed. Pipelining works by executing an instruction in several phases, with each phase being executed in a single pipeline stage. Instructions flow through successive pipeline stages, with all partially-completed instructions moving one stage forward on each processor clock cycle. Instructions complete execution when they reach the end of the pipeline.
Processors attempt to keep pipelines full at all times, thus ensuring a high rate of instruction completion. However, it is possible that an instruction may not be able to progress through one of the stages of a pipeline in a single clock cycle for some reason, for example, because it needs to access slow memory or to compute a multi-cycle operation. Such an event is known as a stall. When stage i of a pipeline stalls it prevents the instruction at stage i−1 from making forward progress, even if the instruction at stage i−1 is not itself stalled. This in turn stalls stage i−2, and so on up to stage 0 (the first stage). When there is a stall at stage i, a signal flows to all stages from 0 to i−1 in the pipeline to cause them to stall before the next active edge of the pipeline clock.
Some processor architectures provide two or more parallel pipelines for processing different instructions (or different parts of an instruction) simultaneously. In this case, the stall signal must be distributed to all pipelines to ensure that instructions which are issued in parallel also complete in parallel. However, the delay of propagating such a global stall signal may restrict the operating clock frequency of the processor. Furthermore, the distance such a signal would have to travel would grow as more pipelines were added. Hence a processor having more pipelines would need a lower clock frequency, thus defeating the high throughput objective of adding further pipelines.