In signal processing a high percentage of the algorithms use loops, usually with high iteration counts and consisting of relatively few instructions and/or operating on relatively few data, such as a line of pixel blocks. To improve the speed of processing, DSPs have been equipped with so-called address generation units (AGUs). These units generate from a current address the next address. Some units support several addressing modes. The units calculate the addresses using a number of registers, depending on the addressing mode. Address generation units are known for generating data addresses (sometimes also referred to as address computation units (ACUs)) as well as for generating instruction addresses (sometimes also referred to as loop control units).
WO 01/04765 describes a VLIW processor for signal processing. The processor includes four the same processing elements, each with a number of functional units. Each of the processing elements includes as a functional unit an address generation unit (AGU). The AGU supports seven addressing modes, being direct addressing, base plus offset addressing, indirect/indexed addressing, base plus index addressing, circular indexed addressing and processing element relative addressing. For each of the addressing modes registers are used to calculate the addresses. For details on the addressing modes, the algorithms for calculating the next address, the registers used by the algorithm and exemplary hardware implementations, the reader is referred to WO 01/04765. The VLIW instruction includes an instruction slot for each of the processing elements. The addressing mode is indicated as part of the instruction. The registers are part of the processor's context and can be loaded, saved, and restored as other registers that are part of the context. The VLIW processor is used in combination with a wide memory for storing the VLIWs. Each memory line stores one VLIW instruction. The memory is accessed for each instruction fetched and fed directly to the decode logic to control the execution of multiple execution units in parallel.
The processing elements of the known VLIW processor are single instruction multiple data stream (SIMD) processors, also known as vector processors. A VLIW vector processor potentially has a very high performance for signal processing. Signal processing tasks that would require such a performance, like a software modem for 3G mobile communication standards, are usually composed of many sub-tasks that can be vectorized. The vector processing does result in such sub-tasks being completed relatively fast. Completed in this context also covers the situation wherein al block of data has been processed and processing will be resumed at a later moment for a new data block (usually in fixed cycles). Consequently, switching between sub-tasks also occurs relatively frequently. A context switch requires that the current registers of one or more ACUs that are used for the currently halted task are saved and the saved registers for the newly activated or re-activated task are loaded into the relevant registers of the involved ACUs. For each ACU, for example, four registers may be involved. For one ACU the context switch may thus include saving/restoring a total of 8 registers.