A processor may receive a set of addresses to perform a direct table lookup to retrieve the elements data values stored at the set of addresses. In a simple case, the number of element data values to be looked up may be 32, and the table size may also be 32, hence a 32-to-32 permute operation may be used. However, in a more general case, the set of addresses may reference element data values in a table of arbitrary size larger than 32 or in different tables, and it may be necessary to search multiple tables for the element data values corresponding to the set of addresses. For example, if the processor receives a request to lookup 32 element data values based on a set of addresses, the processor may need to search up to 32 tables to find the 32 element data values. This may be time consuming and require a large amount of memory.
Further, the processor may be given a data vector to permute with a control to update an output vector. If N element data values need to be permuted, this may include N×N operations. This also may be time consuming.
Accordingly, there is a need for a system capable of efficiently performing a direct lookup and/or efficiently permuting a data vector.