1. Field of the Invention
The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for dynamic data driven alignment and data formatting in a floating point Single Instruction Multiple Data (SIMD) architecture.
2. Background of the Invention
In known systems, data parallel single instruction multiple data (SIMD) architectures with floating point support have either been a floating point only type system in which all instructions must be of a floating point type and only floating point type instructions may access the register file, or have supported a polymorphic type system in which both floating point and integer instructions may be included in the instruction stream and may execute on the register file(s). “Pure breed” floating point SIMD instruction sets provide the benefit of having a single data type and instruction type simplify the implementation due to the absence of variation in data type and instruction type. Pure breed floating point SIMD instruction sets have not, in practice, been utilized, however, because they have limited functionality. That is, with pure breed floating point SIMD instruction sets, i.e., floating point only instruction sets, there is no capability to encode Boolean values and control words, e.g., a “permute” control word. Because there are no Boolean floating point or vector operations, the floating point SIMD only instruction sets limit the ability to perform data-parallel if conversion and further limit the ability to compose complex conditions for data-parallel select operations. Moreover, with such floating point only SIMD instruction sets, there are no compare operations which limit the architecture's ability to perform comparisons and compose complex conditions for data-parallel select operations. Since there are no vector permute operations, the floating point only SIMD instruction sets are limited in their ability to handle dynamic data reorganization to handle unaligned accesses and other data reorganization operations.
Because of these significant limitations, in practice, non floating point only SIMD instruction set architectures (ISAs), i.e. polymorphic ISAs, and paired floating point ISAs targeting complex arithmetic have been utilized. Paired floating point ISAs are limited in much the same manner as floating point only SIMD ISAs with regard to data parallel if-conversion, flexible data arrangement, and handling of data of unknown alignment or known misalignment. Polymorphic ISAs allow multiple different data types to be overlaid, i.e., both floating point and integer data types may be used, in a single vector register file. While polymorphic ISAs allow different data types, and thus, different instruction types, i.e., floating point and integer, there is a large amount of overhead associated with handling different data types and different instruction types. For example, polymorphic ISAs typically require conversion of data types from one type to another to perform certain operations and then a conversion back.
Several polymorphically typed instruction sets have implemented a byte-wise permute instruction. For example, in the Power ISA and AltiVec instruction set, the vector permute (vperm) instruction is provided and in the Cell SPE instruction set, the shuffle byte (shufb) instruction is provided, all available from International Business Machines Corporation of Armonk, N.Y. These permute instructions can be used to dynamically align data elements, such as described in Gschwind et al., “Synergistic Processing in Cell's Multicore Architecture”, IEEE Micro, Vol 26, No. 2, pages 10-24, 2006. That is, the permute instruction allows any byte to be moved to any other place in a vector. Thus, for example, two aligned vectors may be loaded that overlap a misaligned vector which can be shuffled to get an aligned value. Unfortunately, such a permute instruction cannot be performed as a floating point operation but instead requires that the floating point vector values be converted to integers, the shuffle or permute operation is performed on the integer values, and then the result is converted back to a floating point vector value.
While polymorphically typed instruction sets provide a byte-wise permute instruction that allows data to be realigned from an arbitrary byte-alignment position, such byte-wise data rearrangement is expensive and requires large area and power. Moreover, polymorphically typed vector environments are more complex to build, have higher verification costs, have more critical paths, and may require dynamic data conversions to account for possibly different internal representations being used for floating point and integer data types. Furthermore, while natural vector element alignment can be enforced by the compiler and suitable programming language bindings, vector alignment at vector alignment boundaries cannot be enforced by the compiler and bindings because vectors are not data types in most programming languages, and hence programmers will be unable to affect the alignment of vectors directly. Even when vectors are made available as language extensions, programmers may find that algorithmic requirements force the use of vector subranges that may be not naturally aligned with respect to hardware vector register lengths. A value is said to be naturally aligned in memory if its memory byte address is a multiple of its data size in bytes, e.g., a 2 byte data item is naturally aligned if its memory address is divisible by 2, a 4 byte data item is naturally aligned if its memory address is divisible by 4, and so forth.
Data parallel floating point oriented SIMD ISAs are more efficient than polymorphically typed instruction sets because they reduce the cost of implementation for application optimized systems with a floating point focus. However, data parallel floating point oriented SIMD ISAs do not offer dynamic data driven alignment and dynamic data driven formatting because data realignment is not consistent with the operation of floating point data paths, i.e., (1) floating point numbers treat a consecutive number of bytes as a single entity without a capability to separately address bytes within a floating point number, and (2) no known representation for encoding a desired data rearrangement is known.