1. Field
This disclosure relates generally to data processors, and more specifically, to data processors that execute instructions which create permutation values.
2. Related Art
Increased performance in data processing systems can be achieved by allowing parallel execution of operations on multiple elements of a vector. One type of processor available today is a vector processor which utilizes vector registers for performing vector operations. However, vector processors, while allowing for higher performance, also have increased complexity and cost as compared with processors using scalar general purpose registers. That is, a vector register file within vector processors typically includes N vector registers, where each vector register includes a bank of M registers for holding M elements. Another type of known processor is a single-instruction multiple-data (SIMD) scalar processor (also referred to as a “short-vector machine”) which allows for limited vector processing while using any existing scalar general purpose register (GPR). Therefore, although the number of elements per operation is limited as compared to vector processors, reduced hardware is required. However, in current SIMD scalar processors, there is a large overhead associated with transfers of vector elements to the scalar registers for execution and the transfers of multiple vector elements back to memory. The overhead limits the effective throughput of operations as loading and storing multiple vector elements between memory and registers limits the throughput. SIMD scalar processors typically execute vector permute instructions in which a permutation value is generated. Such instructions require a significant overhead to execute because constant values are inserted and memory table lookup operations are often required to provide the desired constant values. Additional processing results from the fact that the memory table size often does not match the number of vector elements in a single vector. Additionally, the number of data storage registers that can be devoted to holding portions of a constant value table for vector processing is limited. These factors limit the usefulness and efficiency of the use of permutation instructions for performing vector table lookup operations.