The invention relates to a permute unit comprising a permute logic and a crossbar working in cycles defined by clocking signals and generating one valid output vector per cycle by treating two parallel input vectors per cycle according to an adequate scheme, plus a method to operate such a permute unit.
Modern microprocessors suffer a huge power drawback, where power is a combination of static and dynamic contribution; thereby the static contribution is approximately proportional to the silicon areas of a macro.
To reduce area within a microprocessor, i.e., to reduce cost of silicon area and leakage power, double pumping is frequently used. When double pumping is applied to a unit within a microprocessor, it operates twice as fast as the incoming datarate. This is typically achieved in adding additional register boundaries in the middle of a cycle. Assuming doubling of the clock frequency and identical wordlength, such a double pumped unit has twice the throughput of a non-double pumped unit. Hence, such a double pumped unit can replace two non-double pumped units.
An example of a unit within a microprocessor is a vector unit (VMX). Typically VMXs are realized fully parallel, i.e., they operate on a full word-length operand of e.g., 128 bit per cycle. A single-instruction multiple-data (SIMD) VMX, like e.g., the VMX of some IBM PowerPC and POWER processors is well suited for the scheme of double pumping, as N=4 identical computational units, like e.g., integer, floating point and logical unit work in parallel on 32 bit input vectors. Neglecting any overhead for double pumping, half the area and leakage power could be saved by double pumping the computational units within a VMX.
A permute unit allows to perform vector permute operations, in which the bytes of a source operand are reordered in the target output. It is typically reused for other instructions requiring multiplexing or shifting operations, particularly those in which the size of additional multiplexers or the size and delay of a barrel shifter is significant. An example for a permute unit in a vector unit is described in N. Maeding et al “The vector fixed point unit of the synergistic processor element of the cell architecture processor”, Proc. of European Solid-State Circuits Conference (ESSCIRC) 2005, pp. 203-206.
However, as the permute unit that is a part of a vector unit operates on the full length of input vectors, i.e., 128 bit, a double pumping scheme is usually not possible.