Processing applications such as signal processing applications typically require a number of data manipulations to be performed in succession on blocks of data. Data communication algorithms for encoding and transforming data such as Fast Fourier Transform (FFT) algorithms, Viterbi algorithms and Turbo decoding algorithms implement basic butterfly operations in an iterative manner. Each butterfly operation involves rearrangement of a vector of data and subsequently requires one or more arithmetic operations to be performed on the rearranged data. It is known to perform such processing operations by providing rearrangement operations and providing forwarding logic for forwarding results of the rearrangement operations to circuitry for performing arithmetic operations. Although the circuitry for performing the rearrangement operations and the circuitry for performing the arithmetic operations may execute in parallel, in known systems the data dependency between the rearrangement operation and the arithmetic operation causes a bottleneck in processing, particularly for processors having limited forwarding logic or for deeply-pipelined processor cores. Indeed the data dependency between the operations can lead to stalling of the computation and loss of performance. Furthermore provision of data forwarding logic to reduce this problem is costly to implement.
Co-pending UK patent application number 0624774.6 filed on 12 Dec. 2006 describes an apparatus and method for performing rearrangement operations on data. In this system, Single Instruction Multiple Data (SIMD) processing logic is responsive to a rearrangement instruction to perform a selected rearrangement operation on a plurality of data elements in dependence upon a scalar parameter identifying a data element width for the data elements on which the selected rearrangement operation is performed.
SIMD is a technique for improving processing performance in applications involving highly repetitive operations. The SIMD technique allows the same operation (e.g. an arithmetic operation) to be performed substantially simultaneously on a plurality of data elements. The SIMD technique enables the number of iterations of a loop of a calculation to be reduced by incorporating multiple processing operations for each loop iteration. The SIMD technique typically uses “packed vectors”, which are data structures containing a plurality of data elements. The SIMD packed vector may be used as an argument for a particular instruction so that the instruction is independently performed substantially simultaneously on all of the plurality of data elements of the packed vector.
Processors employing SIMD processing store data elements from the packed vectors in a special set of registers. The parallel processing is performed by logic units and makes use of this special set of registers. However, significant re-ordering of data will typically be required to create packed vectors from input data elements in order to make a calculation amenable to SIMD processing. The required re-ordering can have an adverse effect on the SIMD code density because several program instructions may be required to perform each re-ordering operation.
Thus, there is a need to provide a mechanism for more efficiently implementing processing operations in order to alleviate the bottleneck due to data dependencies between the rearrangement operations and the arithmetic operations and to improve the code density of algorithms within a SIMD processing system.