Operations involving two operands, where one of the operands is a byte value and the other operand is a known constant value are often feasible to compute with a look-up table, since there are only 256 (=28) possible byte values and corresponding outcomes, so the look-up table will be rather small. For computationally intensive operations (e.g., division, exponentiation), accessing look-up tables tend to be much faster than full hardware or software execution of the same operation. If the processor architecture supports memory loads with register-indexed offsets, the lookup can easily be performed in one RISC instruction.
On machines where the word length (e.g., 32 bits) is a multiple of the byte length (8 bits), several (e.g., 4) bytes can be packed into a single word, thus saving potentially valuable memory space in those applications where memory is limited. This byte-packing scheme can be applied both to a processor's own internal registers and to the memory that a processor accesses. However, where packed bytes are to be used for performing a table lookup, extracting the desired byte will normally require that a series of extra instructions be executed, which reduces efficiency.
Encryption and decryption operations are becoming increasingly important in modern microprocessor applications. Encryption and decryption algorithms may be quite computationally intensive. Such algorithms are frequently used in portable or embedded applications where computing power is limited. Among the more popular block-cipher algorithms are Blowfish, Triple-DES and Rijndael.
All of these algorithms use a special array addressing operation, which requires a long instruction sequence to execute on current microprocessors. The operation is as follows:
                                                        result              =                            ⁢                              pointer                ⁢                                                                  ⁢                                  0                  ⁢                                                                          [                                                            offset                      ⁢                                                                                          ⁢                      0                                        ⪢                    24                                    ]                                                                                                                        ^                                 ⁢                pointer                            ⁢                                                          ⁢                              1                ⁢                                                                  [                                                                            (                                                                        offset                          ⁢                                                                                                          ⁢                          1                                                ⪢                        16                                            )                                        ⁢                                                                                  &                                    ⁢                                                                          ⁢                  0                  ⁢                  xff                                ]                                                                                                        ^                                 ⁢                pointer                            ⁢                                                          ⁢                              2                ⁢                                                                  [                                                                            (                                                                        offset                          ⁢                                                                                                          ⁢                          2                                                ⪢                        8                                            )                                        ⁢                                                                                  &                                    ⁢                                                                          ⁢                  0                  ⁢                  xff                                ]                                                                                                                          ^                                     ⁢                  pointer                                ⁢                                                                  ⁢                                  3                  ⁢                                                                          [                                                                                    offset                        ⁢                                                                                                  ⁢                        3                                            ⁢                                                                                          &                                        ⁢                                                                                  ⁢                    0                    ⁢                    xff                                    ]                                            ;                                                          (        1        )            Four memory access operations involving packed look-up tables are dominant here. Each of these operations extracts one of the four bytes in a 32-bit word, zero extends the extracted byte and then adds it to a base pointer. The result of this indexing operation generates the memory address to be accessed. A significant speed-up of the encryption and decryption process can be achieved if this array access is performed faster.
It is therefore desirable to make the memory access represented by operation (1) more efficient, such that the encryption and decryption application will thereby run faster and with greater power efficiency compared to present implementations of these algorithms.