Single Instruction, Multiple Data (SIMD) is a technique employed to achieve data level parallelism. SIMD operations reduce computing time by performing the same operation on a series of values instead of performing the operation on each value in the series sequentially. Programs may take advantage of SIMD when the same operation is applied to a large number of data points, such as in graphics processing where the same operation is applied to a number of data points representing pixels on a display screen. Typically, these SIMD operations form each result vector register element in parallel by performing an operation on the corresponding elements of two input vector registers, but some vector operations may use only one input vector register or may use both vector registers and scalar registers.
Computing devices commonly utilize vector processing instructions to perform SIMD operations on a series of data values stored in a vector register. These vector processing instructions are included in the instruction set of the processing unit of the computing device. When the processing unit of the computing device executes a vector processing instruction, the processing unit decodes the vector processing instruction and performs the appropriate operation on the appropriate vector register.
Generally, the processing unit must know the width of the vector when processing the vector processing instruction. The processing unit must also generally have access to a vector register that the processing unit can use to process the vector processing instruction. Typically, the vector processing instruction specifies the length of the vector. However, this means that the instruction set for the processing unit must include vector processing instructions for each width of vector that the processing unit will be able to perform SIMD operations upon. In order for a processing unit that supports 64 wide vector processing instructions to support 128 wide vector processing instructions, such as when a 128 wide vector register is added to a new implementation of the processing unit, new instructions must be added to the instruction set for the processing unit. The larger the instruction set for a processing unit, the longer it will take the processing unit to decode and perform instructions. Further, because the vector processing instructions specify a particular vector width, code written for a first processing unit that supports vector processing instructions of a certain vector width cannot be executed by another processing unit that does not support vector processing instructions of that vector width.