1. Technical Field
The present invention relates to a method and apparatus for data processing in general, and in particular to a method and apparatus for performing a permute instruction. Still more particularly; the present invention relates to a method and apparatus for performing a bit-aligned permute instruction within a data processing system.
2. Description of Related Art
The proliferation of multimedia applications lead to an increased demand for processors that have multimedia facilities. One example of such processors is the PowerPC™ processors manufactured by the International Business Machines Corporation of Armonk, N.Y. The multimedia facility for the PowerPC™ processors is the vector multimedia extension (VMX).
For processors that have a vector-based processing architecture, such as the PowerPC™ processors, it is possible to use permute instructions to perform multiple lookup operations. Basically, each permute instruction can store two operands into a result vector in any desirable order. Thus, in an architecture that employs, for example, 128-bit registers, the permuted values from a table can be selectively loaded into one of the 128-bit registers with one instruction, to store 16 bytes of data, which thereby permits 16 table lookup operations to be performed simultaneously.
A permute instruction operates to fill a register with data values from any two other registers and the data values can be specified in any order. Referring now to the drawings and in particular to FIG. 1, there is graphically illustrated the function of a permute instruction according to the prior art. As shown, a permute mask is stored in a register 31, and values that are to be used to form the final result are stored in data registers 32 and 33. The permute instruction uses the values of the permute mask in register 31 to assign corresponding values stored in registers 32 and 33 to a result register 34. Each of registers 31–34 is 16 bytes (i.e., 128 bits) long. The permute instruction enables any one of the 32 source bytes from data registers 32 and 33 to be mapped to any location within result register 34. In the example shown in FIG. 1, byte 1 of register 32 is mapped to byte 0 of result register 34, byte 14 of register 33 is mapped to byte 1 of result register 34, and byte 18 of register 33 is mapped to byte 2 of result register 34, and so on and so forth.
However, the above-mentioned operation is limited in granularity to discrete immutable 8-bit bytes. In other words, the above-mentioned operation does not permit a program to choose a byte from register 32 that starts in the middle of the byte. Because granularity is often needed is specialized data processing, particularly in encryption algorithms, it would be desirable to provide an improved method and apparatus for performing a permute instruction.