1. Field of the Invention
The present invention relates generally to the field of message authentication, and more specifically to an authentication implementation which may be applied for cryptography acceleration. In particular, the invention is directed to a hardware implementation to increase the speed at which SHA1 authentication procedures may be performed on data packets transmitted over a computer network.
2. Description of the Related Art
Message authentication is generally discussed in conjunction with cryptography. Cryptography relates to enciphering and deciphering data. Authentication is concerned with data integrity, including confirming the identity of the transmitting party and ensuring that a message (e.g., a data packet) has not been tampered with en route to the recipient. Many cryptography protocols typically incorporate both encryption/decryption and authentication functionalities. Many methods of practicing both operations are well known in the art and are discussed, for example, in Applied Cryptography, Bruce Schneier, John Wiley & Sons, Inc. (1996, 2nd Edition), herein incorporated by reference.
In order to improve the speed of cryptography and/or authentication processing of data transmitted over a computer network, specialized chips have been developed, for example, the BCM 5805 available from Broadcom Corporation, Irvine, Calif. It is known that by incorporating both cryptography and authentication functionalities in a single accelerator chip, over-all system performance can be enhanced. Cryptography accelerator chips may be included in routers or gateways, for example, in order to provide automatic IP packet encryption/decryption and/or authentication. By embedding cryptography and/or authentication functionality in network hardware, both system performance and data security are enhanced.
Examples of cryptography protocols which incorporate encryption/decryption and authentication functionalities include the IP layer security standard protocol, IPSec (RFC2406), and other network security protocols including Secure Socket Layer (SSL) (v3) (Netscape Communications Corporation) (referred to herein as SSL) and Transport Layer Security (TLS) (RFC 2246), all commonly used in electronic commerce transactions. IPSec (RFC2406) specifies two standard algorithms for performing authentication operations, HMAC-MD5-96 (RFC2403) and HMAC-SHA1-96 (RFC2404). SSL and TLS use a MAC and an HMAC, respectively, for authentication. The underlying hash algorithm in either case can be either MD5 (RFC1321) or SHA1 (NIST (FIPS 180-1)). SSL and TLS deploy such well-known algorithms as RC4, DES, triple DES for encryption/decryption operations. These network protocols are also described in detail in E. Rescorla, SSL and TLS: Designing and Building Secure Systems (Addison-Wesley, 2001) and S. A. Thomas, SSL & TLS Essentials: Securing the Web (John Wiley & Sons, Inc. 2000), both of which are incorporated by reference herein for all purposes. These protocols and their associated algorithms are well known in the cryptography and authentication arts and are described in detail in the noted National Institute of Standards and Technology (NIST), IETF (identified by RFC number) and other noted sources and specifications, incorporated herein by reference for all purposes.
Both MD5 and SHA1 authentication algorithms specify that data is to be processed in 512-bit blocks. If the data in a packet to be processed is not of a multiple of 512 bits, padding is applied to round up the data length to a multiple of 512 bits. Thus, if a data packet that is received by a chip for an authentication is larger then 512 bits, the packet is broken into 512-bits data blocks for authentication processing. If the packet is not a multiple of 512 bits, the data left over following splitting of the packet into complete 512 bit blocks must be padded in order to reach the 512-bit block processing size. The same is true if a packet contains fewer than 512 bits of data. For reference, a typical Ethernet packet is up to 1,500 bytes. When such a packet gets split into 512-bit blocks, only the last block gets padded and so that overall a relatively small percentage of padding overhead is required. However for shorter packets, the padding overhead can be much higher. For example, if a packet has just over 512 bits it will need to be divided into two 512-bit blocks, the second of which is mostly padding so that padding overhead approaches 50% of the process data. The authentication of such short data packets is particularly burdensome and time consuming using the conventionally implemented MD5 and SHA1 authentication algorithms.
For each 512-bit data block, a set of operations including non-linear functions, shift functions and additions, called a “round,” is applied to the block repeatedly. MD5 and SHA1 specify 64 rounds and 80 rounds, respectively, based on different non-linear and shift functions, as well as different operating sequences. In every round, the operation starts with certain hash states (referred to as “context”) held by hash state registers (in hardware) or variables (in software), and ends with a new set of hash states (i.e., an initial “set” of hash states and an end set; a “set” may be of 4 or 5 for the number of registers used by MD5 and SHA1, respectively). MD5 and SHA1 each specify a set of constants as the initial hash states for the first 512-bit block. The following blocks use initial hash states resulting from additions of the initial hash states and the ending hash states of the previous blocks.
Typically, MD5 and SHA1 rounds are translated into clock cycles in hardware implementations. The addition of the hash states, to the extent that they cannot be performed in parallel with other round operations, requires overhead clock cycles in the whole computation. The computation of the padded portion of the data is also generally considered performance overhead because it is not part of the true data. Accordingly, the performance of MD5 and SHA1 degrade the most when the length of the padding is about the same as the length of the data (e.g., as described above, when a packet has just fewer than 512 bits of data and the padding logic requires an extra 512-bit to be added for holding the pad values).
Moreover, the HMAC-MD5-96 and HMAC-SHA1-96 algorithms used in IPSec expand MD5 and SHA1, respectively, by performing two loops of operations. The HMAC algorithm for either MD5 or SHA1 (HMAC-x algorithm) is depicted in FIG. 1. The inner hash (inner loop) and the outer hash (outer loop) use different initial hash states. The outer hash is used to compute a digest based on the result of the inner hash. Since the result of the inner hash is 128 bits long for MD5 and 160 bits long for SHA1, the result must always be padded up to 512 bits and the outer hash only processes the one 512-bit block of data. HMAC-MD5-96 and HMAC-SHA1-96 provide a higher level of security, however additional time is needed to perform the outer hash operation. This additional time becomes significant when the length of the data to be processed is short, in which case, the time required to perform the outer hash operation is comparable to the time required to perform the inner hash operation.
Authentication represents a significant proportion of the time required to complete cryptography operations in the application of cryptography protocols incorporating both encryption/decryption and MD5 and/or SHA1 authentication functionalities. In the case of IPSec, authentication is often the time limiting step, particularly for the processing of short packets, and thus creates a data processing bottleneck. In particular, of the two algorithms supported by the IPSec protocol, HMAC-SHA1-96 is about twenty-five percent slower than HMAC-MD5-96 in terms of the total computation rounds. Accordingly, techniques to accelerate authentication and relieve this bottleneck would be desirable. Further, accelerated implementations of SHA-1 would benefit any application of this authentication algorithm.