Cryptology is a tool that relies on an algorithm and a key to protect information. The algorithm is a complex mathematical algorithm and the key is a string of bits. There are two basic types of cryptology systems: secret key systems and public key systems. A secret key system also referred to as a symmetric system has a single key (“secret key”) that is shared by two or more parties. The single key is used to both encrypt and decrypt information.
For example, the Advanced Encryption Standard (AES), also known as Rijndael, is a block cipher developed by two Belgian cryptographers and adopted as an encryption standard by the United States government. AES was announced in Nov. 26, 2001 by the National Institute of Standards and Technology (NIST) as U.S. FIPS PUB 197 (FIPS 197). Other encryption algorithms are also of interest.
Another example is SM4 (also formerly known as SMS4), a block cipher used in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). It processes the plaintext data in rounds (i.e. 32 rounds) as 128-bit blocks in the Galois field 28, also denoted GF(256), modulo an irreducible polynomial. The SM4 algorithm was invented by Professor LU Shu-wang, and was declassified by the Chinese government and issued in January 2006.
The input, output and key of SM4 are each 128 bits. Each round modifies one of four 32-bit words that make up the 128-bit block by XORing it with a keyed function of the other three words. Encryption and decryption have the same structure except that the round key schedule for decryption is the reverse of the round key schedule for encryption. A software implementation of SM4 (in ANSI C) was published online by the Free Software Foundation in December of 2009. One drawback to a software implementation is performance. Software runs orders of magnitude slower than devoted hardware so it is desirable to have the added performance of a hardware/firmware implementation.
Typical straightforward hardware implementations using lookup memories, truth tables, binary decision diagrams or 256 input multiplexers are costly in terms of circuit area. Alternative approaches using finite fields isomorphic to GF(256) may be efficient in area but may also be slower than the straightforward hardware implementations.
Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers. The central processing unit (CPU) may then provide parallel hardware to support processing vectors. A vector is a data structure that holds a number of consecutive data elements. A vector register of size M (where M is 2k, e.g. 512, 256, 128, 64, 32, . . . 4 or 2) may contain N vector elements of size O, where N=M/O. For instance, a 64-byte vector register may be partitioned into (a) 64 vector elements, with each element holding a data item that occupies 1 byte, (b) 32 vector elements to hold data items that occupy 2 bytes (or one “word”) each, (c) 16 vector elements to hold data items that occupy 4 bytes (or one “doubleword”) each, or (d) 8 vector elements to hold data items that occupy 8 bytes (or one “quadword”) each. The nature of the parallelism in SIMD vector registers could be well suited for the handling of block cipher algorithms.
To date, options that provide efficient space-time design tradeoffs and potential solutions to such complexities, performance limiting issues, and other bottlenecks have not been fully explored.