1. Field of the Invention
This invention is related to the field of processors and computer systems and, more particularly, to performing shift operations in a processor.
2. Description of the Related Art
The x86 architecture (also known as the IA-32 architecture) has enjoyed widespread acceptance and success in the marketplace. Accordingly, it is advantageous to design processors according to the x86 architecture. Such processors may benefit from the large body of software written to the x86 architecture (since such processors may execute the software and thus computer systems employing the processors may enjoy increased acceptance in the market due to the large amount of available software).
As computer systems have continued to evolve, 64 bit address size (and sometimes operand size) has become desirable. A larger address size allows for programs having a larger memory footprint (the amount of memory occupied by the instructions in the program and the data operated upon by the program) to operate within the memory space. A larger operand size allows for operating upon larger operands, or for more precision in operands. More powerful applications and/or operating systems may be possible using 64 bit address and/or operand sizes.
Included among the x86 instructions which are commonly implemented, are shuffle instructions which are configured to relocate or reorder portions of an operand within itself. Given these shuffle instructions are part of the x86 instruction set, processor architectures which support the x86 instruction set generally include circuitry to perform shuffles.
In addition to shuffle instructions, the x86 instruction set includes a number of shift instructions as well. Those skilled in the art are well aware of the wide ranging uses of shift operations within processors generally. As processors have advanced, and the applications to which they are applied become more sophisticated, extensions to the instruction set have been introduced. For example, the x86 Streaming SIMD Extensions (SSE) instruction set has been extended to include 128-bit shift instructions. While such instructions may be beneficial for particular applications, efficiently supporting such instructions in a given processor may present some challenges. For example, even in an x86 architecture which has been configured to support 64-bit operands, a 128-bit shift operation may generally require circuitry to logically concatenate at least two 64-bit registers in order to accommodate a 128-bit operand. Depending upon the implementation, such an approach may introduce additional latencies into the critical path.
In view of the above, an effective method and mechanism for performing shift operations is desired.