Digital signal processing application code typically performs arithmetic processes on vectors, usually by combining data vectors with coefficient vectors. A common example is the process of convolution, but other vector processes share similar characteristics. The data and coefficient vectors are of varying size, and often quite long, so that it is not generally practical for a machine to implement the vector functionality literally. It has instead been found practical for a machine to operate directly on smaller pieces of the vectors.
In one known method, the Single Instruction Multiple Data (SIMD) technique applies a single operation contained in an instruction to each element of one or more short vectors of data. The registers of a SIMD machine are designed to hold such short data vectors; for example, a 64-bit register may contain four 16-bit data elements forming a short vector or part of a larger vector. SIMD techniques are an effective way of increasing digital signal processor performance, by increasing the operations per cycle of the processor.
Digital signal processing programs therefore typically use a short vector SIMD machine to perform a long vector operation. In order to do so, it is often necessary for the program structure to interleave arithmetic operations with vector permutation operations. The vector permutations may be necessary, for example, in order to gather elements from the longer vectors of the algorithm to be supplied as operands to the short vector datapaths of the machine; or in order to distribute the result elements from a short vector operation to different locations in the long vector result.