Multimedia and cryptographic applications are increasingly ubiquitous, driving an increased demand for efficient facilities within processors that execute special instructions to enhance their execution. In particular, the ability to quickly rearrange the position of bytes in a general way or to shift them left or right enhances the execution of many of these applications. These operations are often performed by permute and shift instructions respectively. Most instructions supported by a typical multimedia facility within a processor architecture require specialized hardware to decrease their execution time.
Shift operations are often performed utilizing a barrel shifter. However, this approach consumes increased area as the shifts become wider. The ability to perform wide shifts (64 to 128 bits), useful in performing multimedia operations, is very expensive in hardware implementations. The area needed for a barrel shifter performing wide shifts, and the latency incurred by such a shifter, may become significant. Increased area leads to increased energy consumption and increased cost due to lower chip yields and increased cooling requirements.
Permute operations are often performed by executing a sequence of instructions or, more efficiently, in a unit incorporating a crossbar switch that can execute a special permute instruction. A crossbar switch can reorder an arrangement of bytes into a different arrangement and is useful in many applications, especially multimedia and cryptographic applications.