Modern processors are required to execute multiple instructions within a very short period of time. Some processors increase their throughput by executing multiple instructions in parallel. These parallel-processed instructions are arranged in variable length groups of instructions that are also known as Very Long Instruction Words (VLIW).
Multiple VLIWs can form an instruction loop that should be repetitively executed. The instruction loop is usually detected by comparing a current program counter value to a start address of the instruction loop and, additionally or alternatively, to an end address of the instruction loop.
A typical pipelined processor includes multiple pipeline stages such as a fetch unit, a dispatch unit, a decode unit, an execution unit and the like. The fetch unit fetches instructions from a memory unit via an instruction bus. The fetch unit has a certain memory space for storing instructions. This memory space has a limited size due to cost and semiconductor area constrains.
The size of an instruction loop can exceed the size of the memory space allocated for storing instructions in the fetch unit. An instruction loop that can not be entirely stored in the fetch unit is referred to as a long instruction loop. When such a long instruction loop is executed there is a need to stall the pipeline stages due to starvation as the pipelined stages can process more instructions than they are being fed, or because of the change of flow associated with the fetching and executing the first instruction (start address) in the loop immediately after the last instruction.
The size of a VLIW can exceed the width of the instruction bus. In this case the retrieval of a VLIW instruction from the memory unit (to the fetch unit) can be slower than the rate of providing the VLIWs to the dispatch unit. For example, if the VLIW can be up to 144 bits long and the instruction bus is 128 bit wide then the difference between a fetch rate of 128 bits per cycle and a retrieval rate of 144 bit per cycle can empty the fetch unit. The mentioned above cycle can be the execution cycle of a pipeline stage.
If, for example, the start address of the first VLIW of an instruction loop is a cross boundary between two fetch sets (a fixed size chunk of data from the memory unit to the fetch unit) than the first VLIW will be fetched during two cycles, causing a stall.
Imposing limitations on the length of the instruction loop (for example, limiting it to instructions that can be fetched during only two fetch cycles) is not practical.