In devices implementing encryption/decryption algorithms it is necessary to perform binary functions over an input set of bytes to be encrypted/decrypted for generating corresponding output bytes. These operations include a one-to-one binary function that is implemented by logic circuitry. This logic circuitry is required to be fast, consume low power and occupy a small silicon area.
Because of the importance of the Rijndael AES encryption/decryption algorithm, the problem to be addressed is presented with respect to this algorithm, but the same considerations hold also for any binary one-to-one function.
A brief overview of AES will now be provided. In Jan. 2, 1997, the National Institute of Standards and Technology (NIST) announced the beginning of the development of the Advanced Encryption Standard (AES). The overall goal was to develop a Federal Information Processing Standard (FIPS) that specified an encryption algorithm capable of protecting sensitive (unclassified) government information into the twenty-first century.
The formal call for algorithms was made on Sep. 12, 1997. The algorithms were required to implement symmetric key cryptography as a block cipher and to support a block size of 128 bits, and key sizes of 128, 192 and 256 bits. In Aug. 20, 1998, NIST announced fifteen AES candidate algorithms at the First AES Candidate Conference, and solicited public comments on the candidates. A Second AES Candidate Conference was held in March 1999 to discuss the results of the analysis that was conducted by the international cryptographic community on the candidate algorithms. In August 1999, NIST announced its selection of five finalist algorithms from the fifteen candidates. The selected algorithms were MARS, RC6, Rijndael, Serpent and Twofish.
A lot of attention had been put on the complexity of the algorithms. A good AES algorithm was required to be easily implemented on general purpose processors and on reconfigurable hardware, and light from a computational point of view.
NIST judged the Rijndael algorithm to be the best algorithm for the AES at the end of a very long and complex evaluation process in which all public comments, papers, verbal comments at conferences, NIST studies and reports had been analyzed. The official announcement was made on Oct. 2, 2000 and the standard was completed with the publication of FIPS-197 [1] on Nov. 26, 2001.
Many ways of forming devices for implementing efficiently the Rijndael AES encryption/decryption algorithm have been investigated. Papers (see for instance [2, 3]) that describe implementations of the AES algorithm on field programmable gate arrays (FPGAs) are present in the technical literature. Few of them (see [4]) describe an implementation of the Rijndael algorithm on application specific integrated circuit (ASIC) platforms.
A custom but still flexible implementation, however, would be desirable in high speed or embedded dedicated cores in which fast and/or low-power computation is desirable. The consumed silicon area for forming an electronic circuit that performs the steps of the Rijndael algorithm is an important parameter for custom implementations. The smaller the consumed area, the lower the unit cost, and thus the higher the number of hardware devices produced on the same silicon die.
It has been observed [5] that the realization of the so-called S-box, which is a hardware device that performs the byte substitution operation (ByteSub) contemplated by the algorithm, is critical for reducing the area consumption. This operation is the bottle neck of the algorithm because it must be repeated many times and it is implemented by a one-to-one nonlinear binary function. Moreover, given that this function is nonlinear, it is not possible to form a hardware device for performing it by using standard synthesis techniques.
The ByteSub operation is a composition of two binary functions defined on bytes. The first function is an inversion in the finite field (or Galois Field) GF(28), which is a field composed of bytes, while the second is an affine function. Optimum security properties are obtained for the entire cipher system by combining in cascade these two functions (see for instance [6] and [7]).
The first function is more complex than the second from a computational point of view because it behaves almost like a purely random function. For this reason, the S-box is generally synthesized starting from the complete truth table of the function. A behavioral table-like description is provided to a VHDL compiler, and the corresponding combinatorial function is extracted and synthesized with logic gates.
An alternative approach includes considering the ByteSub operation as a function defined on the composite finite field GF((24)2). The input byte is separated into two nibbles and the problem of inverting the first function is reduced to the problem of inverting a function in the inner field GF(24) which is smaller. This method has been originally proposed by Rijmen in [8], and further developed in [9]. See also [10], that contains performance estimations of the obtained circuit when implemented by the HCCMOS7 technology library of STMicroelectronics. A further study on the optimization of the S-box has been proposed in [11], mainly addressing speed issues.
Another important issue in custom implementations is power consumption. Although many techniques are described in technical literature for reducing power consumption at the transistor level and at higher levels, still a hand-made analysis and design is often useful to produce low power custom units. Synthesis tools have features for power optimization, but still most of the power can be saved by human analysis and by giving the VHDL compiler a good starting point.
Moreover, power optimization is often in contradiction with other design goals, such as a small chip area and high speed, high frequency operation. A low power approach for designing the Rijndael S-box is discussed in [12]. The authors start from the compact implementation discussed in [8], exploit a positive polarity reed-muller (PPRM) technique, and add delay chains to reduce power consumption and glitches of the architecture.