1. Field
The present application relates generally to computer processors and, more specifically, to mechanisms that support execution of software pipelined loops.
2. State of the Art
Computer processors execute operations on data. An individual data value (an operand) is produced by some producer operation, recorded, and then used later by one or more other consumer operations. The time between production and consumption by the last consumer is the lifetime of the operand. Operands vary widely in lifetime, but lifetimes can usually be loosely categorized into persistent (or global) lifetimes that last for an appreciable fraction of total program execution; local lifetimes that last for the duration of a function or several statements in the program; and transient lifetimes that last for only portions of a single expression in the program. These categories are not sharp, and programs exhibit a continuum of lifetimes, but the rough grouping is strong enough that computer hardware usually contains different storage means for operands of each category. For example, persistent operands may use a software-provided heap in memory, while local operands may use a hardware-assisted stack and transient operands use a wholly hardware register bank.
Transient operands are ubiquitous and very common. For example, if the source program contains the expression “A+B+C” then the computer will execute a first add operation of A and B, and then a second add operation of the result of the first add operation to C. The A+B result is typically transient and will be discarded as soon as it is consumed by the second add operation, although it may have a longer lifetime if the same A+B calculation appears elsewhere and the intermediate result can be reused.
Many prior art computer processors employ a set of general registers, which are storage devices that can hold a single operand each. Machine operations like addition take their arguments from and deliver their result to registers. Thus, a register is the holding place for transient operands. When the lifetime of an operand ends, the register holding it can simply be overwritten by some other newly computed operand. Register usage by a program is very high because there are so many transients. Consequently, computer processor designers go to great lengths to ensure that access to registers is very fast and that there are enough registers to hold any reasonable transient population. Operands that do not fit in the available registers must be kept elsewhere, typically in memory, and access to such spilled operands takes tens to hundreds of times longer than access to a register. Because of the speed advantage of registers, registers not needed for transients are commonly used for frequently-referenced operands with more-than-transient lifetimes, even very long lived global operands. Each extra operand that can reside in the registers improves the speed of the program by avoiding lengthy memory access.
Optimizing compilers can employ an instruction scheduling technique known as software pipelining, which parallelizes a loop by overlapping the execution of different iterations of the loop. Rotating register space, as used by the SPARC and Itanium architectures, can be used by a software pipelined loop in order to store the operands over the iterations of the loop. The registers of the rotating register space are typically sized to fit the largest possible operand, which can be quite large for vector data, but are frequently occupied by small operands, wasting the hardware on unused storage. Furthermore, the number of registers is fixed by design and cannot be dynamically configured based on the needs of the program. Furthermore, rotating registers are typically used for computation temporaries as well as for loop-carried variables. When the registers are rotated, dead registers rotate too, as well as those that are live over the iterations. This waste increases the register pressure, and complex code may run out of registers.