Cryptology is a tool that relies on an algorithm and a key to protect information. The algorithm is a complex mathematical algorithm and the key is a string of bits. There are two basic types of cryptology systems: secret key systems and public key systems. A secret key system also referred to as a symmetric system has a single key (“secret key”) that is shared by two or more parties. The single key is used to both encrypt and decrypt information.
For example, the Advanced Encryption Standard (AES), also known as Rijndael, is a block cipher developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen and adopted as an encryption standard by the United States government. AES was announced in Nov. 26, 2001 by the National Institute of Standards and Technology (NIST) as U.S. FIPS PUB 197 (FIPS 197).
AES has a fixed block size of 128 bits and a key size of 128, 192 or 256 bits. Key expansion using Rijndael's key schedule transforms the keys of size 128, 192 or 256 bits into 10, 12 or 14 round keys of 128 bits. The round keys are used to process the plaintext data in rounds as 128-bit blocks (viewed as 4-by-4 arrays of bytes) and convert them to ciphertext blocks. Typically, for a 128-bit input to the round (16 bytes) each byte is replaced by another byte according to a lookup table called the S-box. This portion of the block cipher is called SubBytes. Next the rows of bytes (viewed as a 4-by-4 array) are cyclically shifted or rotated left by a particular offset (i.e. row zero by 0 bytes, row one by 1 byte, row two by 2 bytes and row three by 3 bytes). This portion of the block cipher is called ShiftRows. Then each of the columns of bytes are viewed as four coefficients of a polynomial in a finite field, GF(256) (also called Galois field 28), and multiplied by an invertible linear transformation. This portion of the block cipher is called MixColumns. Finally, the 128-bit block is XORed with a round key to produce a ciphertext block of 16 bytes, which is called AddRoundKey.
On systems with 32-bit or larger words, it is possible to implement the AES cipher by converting the SubBytes, ShiftRows and MixColumns transformations into four 256-entry 32-bit tables, which utilize 4096 bytes of memory. One drawback to a software implementation is performance. Software runs orders of magnitude slower than devoted hardware so it is desirable to have the added performance of a hardware/firmware implementation.
Typical straightforward hardware implementations using lookup memories, truth tables, binary decision diagrams or 256 input multiplexers are costly in terms of circuit area. Alternative approaches using finite fields isomorphic to GF(256) may be efficient in area but may also be slower than the straightforward hardware implementations.
Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers. The central processing unit (CPU) may then provide parallel hardware to support processing vectors. A vector is a data structure that holds a number of consecutive data elements. A vector register of size M (where M is 2k, e.g. 256, 128, 64, 32, . . . 4 or 2) may contain N vector elements of size O, where N=M/O. For instance, a 64-byte vector register may be partitioned into (a) 64 vector elements, with each element holding a data item that occupies 1 byte, (b) 32 vector elements to hold data items that occupy 2 bytes (or one “word”) each, (c) 16 vector elements to hold data items that occupy 4 bytes (or one “doubleword”) each, or (d) 8 vector elements to hold data items that occupy 8 bytes (or one “quadword”) each. The nature of the parallelism in SIMD vector registers could be well suited for the handling of secure hashing algorithms.
Other similar encryption algorithms may also be of interest. For example, the Rijndael specification per se is specified with various block and key sizes that may be any multiple of 32 bits, both with a minimum of 128 and a maximum of 256 bits. Another example is SMS4, a block cipher used in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). It also processes the plaintext data in rounds (i.e. 32) as 128-bit blocks in GF(256) but performs reductions modulo a different polynomial.
To date, options that provide efficient space-time design tradeoffs and potential solutions to such complexities, performance limiting issues, and other bottlenecks have not been fully explored.