As processor technology advances, newer software code is also being generated to run on machines with these processors. Users generally expect and demand higher performance from their computers regardless of the type of software being used. Issues can arise from the kinds of instructions and operations that are actually being performed within the processor. Certain types of operations require more time to complete based on the complexity of the operations and/or type of circuitry needed. This provides an opportunity to optimize the way certain complex operations are executed inside the processor.
Media applications are drivers of microprocessor development. Accordingly, the display of images and playback of audio and video data, which are collectively referred to as content, have become increasingly popular applications for current computing devices. Such operations are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as single instruction multiple data (SIMD) registers. A number of current architectures also require multiple operations, instructions, or sub-instructions (often referred to as “micro-operations” or “μops”) to perform various mathematical operations or data permutation operations on a number of operands, thereby diminishing throughput and increasing the number of clock cycles required to perform these operations.
A single instruction multiple data (SIMD) permute instruction is an existing data reorganization instruction on many architectures. Such instructions can be used to write data elements from one or more source locations into elements of a destination location. For example, vector registers that include multiple data elements can be the source and destination locations. By using such an instruction, one or more data elements from a source can be written into the destination. However, little or no flexibility exists to enable varied control of the selection mechanism or the ability to flexibly control zeroing of permuted fields.