Software pipelining is a compiler technique that transforms a loop described in a high-level programming language, such as C or FORTRAN, in such a way that the execution of successive iterations of the loop are overlapped rather than sequential. This technique exposes to the compiler and to the processor executing the transformed code the instruction level parallelism (ILP) available between successive loop iterations.
One of the side effects of this transformation is that the successive lifetimes of some loop variables are now overlapping in time. Such variables that were modeled as scalar values before the transformation (virtual registers) now typically need to be modeled as vectors of scalar values (Expanded Virtual Registers or EVR). Successive values stored in an EVR represent successive values in time of the original virtual register.
One compiler technique used to remedy this side effect is called Modulo Variable Expansion (MVE). It unrolls the code of the transformed loop a number of times equal to the number of overlapped lifetimes for the variable that maximizes this value. This, however, typically increases code size significantly.
Proper architectural support may obviate the need for MVE and code expansion. One approach is called a Rotating Register File (RRF). In an RRF, a configurable and contiguous set of registers is defined as the rotating section of the RRF and holds the EVRs. The other registers are unaffected and typically hold the variables that are not modified during the loop. Control registers (outside of the register file) define the position of, size of, and a current base inside the rotating section of the register file.
As an example, assume in the following that the rotating section starts at address 0 of the register file. When a register in the rotating section is accessed, its address, decoded from the instruction, is added to a base address, modulo the rotating section length. The base address is decremented by one each time an iteration terminates, as indicated by the execution of a specific branch instruction. The net effect of this renaming scheme is that a value defined at address A in iteration n appears at address A+1 in iteration n+1. The successive temporal values of a variable appear to shift through the successive registers allocated to its EVR. When the base is decremented, the register file is said to rotate.
The RRF suffers from implementation complexity, however, which may explain why it is rarely present in commercial products, and not at all in the embedded space, where cost and power dissipation are prime issues. The main contributor to this complexity is the register renaming logic, instantiated as many times as there are ports in the register file. This logic includes an addition (summing the decoded register address to the base address) producing an address+base value, followed by a subtraction (removing the rotating section length from the resulting address+base) producing and address+base−length value, followed by a selection (selecting between address+base and address+base−length, depending on the sign of the subtraction) resulting in the final renamed address. Together, the subtraction and selection implement a modulo operation. Another selection is required between the decoded address and the renamed address, depending on whether the decoded address accesses the rotating section or not. This logic also impacts the register address decoding critical path. This complexity may also explain why the rotating register section length is often limited to multiples of eight, leading to a waste of up to seven registers.