1. Field of the Invention
The present invention relates generally to encryption and decryption. The invention relates more specifically to a unique software implementation of the round function of the Data Encryption Standard to reduce required computation.
2. Description of the Related Art
An encryption algorithm provides data confidentiality by disguising data such that an outsider that obtains the disguised data cannot recover the original data with a reasonable amount of time and effort. The Data Encryption Standard (DES), described in U.S. Pat. No. 3,962,539, is a highly popular symmetric-key encryption algorithm. The DES encryption algorithm accepts a 64-bit plaintext block P and a 56-bit key K as inputs, and the algorithm outputs a 64-bit ciphertext block C. The plaintext block P is the data to be disguised, and the ciphertext block C is the disguised result. Similarly, the DES decryption algorithm accepts a 64-bit ciphertext block C and the 56-bit secret key K as inputs and the decryption algorithm outputs the 64-bit plaintext block P. FIGS. 1 and 2 illustrate DES decryption and encryption, respectively. Encryption is denoted as DES and decryption is denoted as DES−1.
DES is a symmetric-key cipher because the decryption key is equivalent to the encryption key. If DES is secure, an outsider cannot easily recover P given C without knowledge of the secret key K. In addition, an outsider cannot easily discover the secret key K given a plaintext block P and the corresponding ciphertext block C encrypted under K.
In the context of available computing power, DES suffers from insufficient key length. Given a plaintext P and the corresponding ciphertext C for some key K, an outsider can recover the secret key K by observing the results of the DES encryption of P using all possible values for K. This brute-force attack can be completed in a short period of time using a reasonable amount of computer hardware. To prevent such an attack, some communications and storage security systems employ Triple DES (3DES). Triple DES provides a larger effective key length than DES by sequentially encrypting a plaintext block with DES three times using three different keys.
DES encrypts a plaintext block in three steps, as illustrated in FIG. 3. First, DES performs a fixed initial permutation on the bits of the 64-bit plaintext block. The result of this permutation is then subjected to 16 identical rounds of permutation and substitution operations. The jth DES round, where 1≦j≦16, employs a 48-bit round key RK(j) that is deterministically generated from K. Lastly, DES performs a fixed final permutation on the output of the sixteenth round. The result of this final permutation is the 64-bit ciphertext block. Decryption proceeds in the same manner, but the 16 round keys generated from K are used in reverse order. More specifically, the jth round in DES decryption uses the round key RK(17−j).
FIG. 4 depicts the jth DES encryption round. The values L(j) and R(j) are the leftmost 32 bits and the rightmost 32 bits of the 64-bit input to the jth round, respectively. K is the 56-bit DES secret key. The encryption round proceeds as follows. R(j) is subjected to the Expansion Permutation, a fixed mapping of the 32 bits of R(j) to a 48-bit output. Since the number of bits in the output is greater than that of the input, some input bits are mapped to multiple output bits. The 48-bit result of the Expansion Permutation is then XORed with the output of the round key generation function F. F is a nonlinear function that accepts the 56-bit DES secret key K and the round number j as inputs; F outputs a 48-bit result, RK(j), which is the round key. The result of the XOR operation between the round key and the output of the Expansion Permutation is then divided into eight 6-bit blocks. These 6-bit blocks are applied as inputs to the eight DES S-boxes. Each S-box accepts a 6-bit input and outputs a 4-bit result, and the S-boxes represent fixed nonlinear functions of the input bits. The 32 output bits of the S-boxes are then subjected to the DES P-box Permutation. The P-box Permutation is a fixed bijective permutation that maps 32 input bits to 32 output bits. Lastly, the output of the P-box is XORed with L(j), and this 32-bit result is R(j+1) in the next round. The value of L(j+1) in the next round is simply the value of R(j) from the current round.
Implementing permutations in hardware is trivial: Wires representing input bits can simply be connected to wires representing the output bits without using any gates or other logic. In software, however, performing bit-level permutation is a difficult task. Instruction set architectures for existing general-purpose microprocessors do not include instructions that can be used to efficiently complete such permutations. Performing a bit-level permutation of an n-bit register can require as many as O(n) instructions on general-purpose RISC or CISC microprocessors.
The round key computation function F is difficult to compute in software, for F involves a bit-level permutation. In practice, however, the function F is rarely executed. A DES secret key K will often be used to encrypt/decrypt hundreds, thousands, or even millions of bytes of data. The 16 round keys corresponding to a secret key K can be computed once prior to the encryption/decryption of all the 64-bit blocks of the data rather than once for each 64-bit block. Hence, the computation cost of calculating F is usually negligible; the computation is amortized over the encryption/decryption of many data blocks.
The S-boxes are usually implemented as lookup tables in software. We refer to the input to an S-box lookup table as the index, and we refer to the possible outputs of the S-box as the table entries. To eliminate processing associated with the P-box Permutation, software implementations of DES often combine the P-box Permutation with the S-box lookup tables to form SP-box lookup tables. Each S-box outputs a 4-bit value, but the SP-boxes output a 32-bit or larger value in which the 4-bit S-box result is already permuted per the P-box Permutation. Hence, no explicit run-time processing is needed to complete the P-box Permutation: The permutation is built into the eight SP-box lookup tables. If the SP-box outputs are 32 bits in size, the 28 output bits that do not represent bits of the original 4-bit S-box output are set to zeroes. The results of the eight SP-box outputs can be combined by performing seven bitwise logical XOR or bitwise logical OR operations following the eight SP-box table lookups.
TABLE 1DES EXPANSION PERMUTATION3212345456789891011121312131415161716171819202120212223242524252627282928293031321
Table 1 illustrates the DES Expansion Permutation. The Expansion Permutation is easier to compute than the P-box Permutation in software, for most of the permutation involves 6-bit blocks rather than individual bits. Each entry represents a bit of the 48-bit permutation output; the entries are arranged in increasing order from left to right and then from top to bottom. The number in each entry indicates the location of the bit in the 32-bit input that is mapped to the output bit corresponding to that entry. For example, bit 2 of the 32-bit input R(j) is mapped to bit 3 of the Expansion Permutation output, as the numeral 2 appears in the third entry. Similarly, bit 12 of R(j) is mapped to bits 17 and 19 of the output. In straightforward software implementations of DES, the Expansion Permutation requires very little computation. The inputs to six of the eight S-boxes are simply the XOR results of six contiguous bits of R(j) with six bits of the round key R(j). Preparing the inputs to the first and the eighth S-boxes requires additional special computation due to discontinuities in the Expansion Permutation. For example, the first six bits of the output of the Expansion Permutation, which are involved in the generation of the input to the first SP-box, include bits from both the right and the left ends of the 32-bit value R(j). These discontinuities are easy to handle. On many RISC processors, only one assembly instruction—if any at all—per DES round is required to complete this special computation.
To improve performance of software implementations of DES for processors with large memory caches, the SP-box lookup tables can be combined in pairs to form four large lookup tables. By using four SP-box tables rather than eight, the number of table lookups per round is reduced from eight to four, and the number of bitwise XOR (or bitwise OR) operations required to combine the results is reduced from seven to three. Each of these large SP-box tables is indexed with a 12-bit input (from the combination of two 6-bit inputs) and contains 4096 entries. A table entry in a large SP-box corresponding to a 12-bit index W, which is the concatenation of two 6-bit indexes U and V, is the result of bitwise XORing the entry corresponding to the index U in the first small table with the entry corresponding to the index V in the second small table. If the size of a table entry is 32 bits, the total size of all four tables is 64 kilobytes. Large SP-boxes do not reduce the number of instructions required to complete the Expansion Permutation and the round key XOR operations, however.