1. Field of the Invention
This invention relates to computer processors and, more particularly, to performing byte-permutation and bit-shift operations in computer processors.
2. Description of the Related Art
Microprocessors have evolved to include a variety of features aimed at improving the speed and efficiency with which instructions are executed. At the same time, microprocessors have been designed around a variety of instruction architectures. For example, the x86 architecture (also known as the IA-32 architecture) has enjoyed widespread acceptance and success in the marketplace. Accordingly, it is advantageous to design processors according to the x86 architecture. Such processors may benefit from the large body of software written to the x86 architecture (since such processors may execute the software and thus computer systems employing the processors may enjoy increased acceptance in the market due to the large amount of available software).
Included among the x86 instructions that are commonly implemented are shuffle instructions. Shuffle instructions are configured to relocate or reorder portions of an operand within itself. Shuffle instructions may perform a variety of functions, such as packing, unpacking, byte interleaving, swizzle, and other byte permutations. Processor architectures which support the x86 instruction set generally include circuitry to perform shuffles using operands of up to 32-bytes.
In addition to shuffle instructions, the x86 instruction set includes a number of shift instructions. Those skilled in the art are well aware of the wide-ranging uses of shift operations within processors generally. As processors have advanced, and the applications to which they are applied become more sophisticated, extensions to the instruction set have been introduced. For example, the x86 Streaming SIMD Extensions (SSE) instruction set has been extended to include 128-bit shift instructions. While such instructions may be beneficial for particular applications, efficiently supporting such instructions in a given processor may present some challenges. For example, even in an x86 architecture that has been configured to support 64-bit operands, a 128-bit shift operation may generally require circuitry to logically concatenate at least two 64-bit registers in order to accommodate a 128-bit operand. Depending upon the implementation, such an approach may introduce additional latencies into the critical path.
In addition, the x86 instruction set includes support for byte-level and bit-level shift operations. Shift operations may also include either logical or arithmetic shift operations. Arithmetic right shift operations must include sign-extension whereas logical shift operations do not include sign-extension. Generally speaking, logical and arithmetic classes of shift operations have been implemented using separate execution units, incurring higher costs in terms of circuit area and power consumption. In view of the above, an effective method and mechanism for performing shift operations is desired.