1. Field of the Invention
The present invention relates generally to hardware cryptographic execution unit, and more particularly, to methods and systems for an improved MD5 cryptographic execution unit.
2. Description of the Related Art
Data encryption has become commonplace. By way of example, the commonplace e-commerce transactions that occur between customers and merchants on the Internet have driven the demand for efficient and secure interchange of sensitive, personal, financial data between the parties. There are four basic categories of cryptographic algorithm functionality: public key encryption algorithms, bulk encryption algorithms, random number generation algorithms and hashing algorithms. In order to ensure data integrity, several standard cryptographic hash algorithms have been developed and include: MD5 (i.e., Message Digest 5), SHA1 (i.e., secure hash algorithm) and other cryptographic hash algorithms. The MD5 and SHA1 algorithms are described in detail in the “Handbook of Applied Cryptography” by authors Alfred J. Mendezes, Paul C. van Oorschot and Scott A Vanstone, which is incorporated by reference herein for all purposes.
Briefly described, MD5 is a one-way hash function algorithm that is used in many situations, such as to create digital signatures. MD5 was designed for use with 32 bit machines and is more secure than earlier cryptographic algorithms (e.g., MD4 algorithm). A one-way hash function means that MD5 takes a message and converts it into a fixed-length string of digits (i.e., a message digest). The one-way hash function allows a calculated message digest to be compared against the message digest that is decrypted with a public key to verify that the message hasn't been tampered with. This comparison is called a “hashcheck.”
FIG. 1 shows a typical server 102 and client computer 110 that are linked by a network 104, such as the Internet or other network. FIG. 2 is a high-level block diagram of a typical server 102. As shown, the server 102 includes a processor 202, ROM 204, and RAM 206, each connected by a peripheral bus system 208. The peripheral bus system 208 may include one or more buses coupled to each other through various bridges, controllers and/or adapters, such as are well known in the art. For example, the peripheral bus system 208 may include a “system bus” that is connected through an adapter to one or more expansion buses, such as a Peripheral Component Interconnect (PCI) bus. Also coupled to the peripheral bus system 208 are a mass storage device 210, a network interface 212, a number (N) of input/output (I/O) devices 216-1 through 216-N and a peripheral cryptographic processor 220.
I/O devices 216-1 through 216-N may include, for example, a keyboard, a pointing device, a display device and/or other conventional I/O devices. Mass storage device 210 may include any suitable device for storing large volumes of data, such as a magnetic disk or tape, magneto-optical (MO) storage device, or any of various types of Digital Versatile Disk (DVD) or Compact Disk (CD) based storage.
The peripheral cryptographic processor 220 (i.e., crypto-processor) is linked to the processor 202 by the peripheral bus system 208. The crypto-processor 220 includes one or more crypto processing units 228A, 228B. Each of the crypto processing units 228A, 228B is for performing a single crypto algorithm (e.g., MD5 or SHA1). The crypto-processor 220 performs encryption and decryption operations that may be necessary for encrypted data transactions such as between the server 102 and the client 110. In some servers the crypto-processor 220 can also be external to the server 102 and linked to the processor 202 by one of the I/O devices 216-1 through 216-N.
Network interface 212 provides data communication between the computer system and other computer systems on the network 104. Hence, network interface 212 may be any device suitable for or enabling the server 102 to communicate data with a remote processing system (e.g., client computer 110) over a data communication link, such as a conventional telephone modem, an Integrated Services Digital Network (ISDN) adapter, a Digital Subscriber Line (DSL) adapter, a cable modem, a satellite transceiver, an Ethernet adapter, or the like.
Typically the processor 202 can operate at clock speeds of up to or even more than 1 GHz. Conversely, the peripheral bus system 208 typically operates at a substantially slower speed such as about 166 MHz or similar. Further, the crypto-processor 220 typically operates at a speed similar to the peripheral bus system 208. This is because the crypto-processor 220 cannot process data any faster than the data can be transported across the peripheral bus system 208. Further, the crypto-processor 220 is typically a customized, specialized processor (i.e. an application specific integrated circuit (ASIC)) that may not be made by the latest, highest performance manufacturing technologies and therefore the maximum processing speed (i.e., the crypto-processor clock speed) of the crypto-processor 220 is typically substantially less than the maximum processing speed of the processor 202.
FIG. 3 is a flowchart diagram of the method operations 300 of a typical encrypted data transaction within the server 102. The encrypted data transaction can be any data transaction that required encryption, decryption or both encryption and decryption such as an e-commerce transaction between the server 102 and the client computer 110. In operation 305, data is received in the server 102 such as from the client computer 110 or because of a request by the client computer 110.
In operation 310, the received data is analyzed to determine if the received data is encrypted. For example, the data may be encrypted because the data includes a user's personal and/or financial data or other data that is transported during an encrypted session.
If the received data is found to not be encrypted data, in operation 310, then the received data is processed as described in operation 330 below. Alternatively, if, in operation 310, the received data is determined to be encrypted data, then, in operation 315, the encrypted data is sent to the peripheral crypto processor 220 via the peripheral bus system 208.
In operation 320, the crypto processor 220 decrypts the encrypted data. In operation 325, the crypto processor 220 outputs the decrypted data to the processor 202 via the peripheral bus system 208. In operation 330, the processor 202 processes the data to produce result data.
In operation 335, the result data is analyzed to determine if the result data should be encrypted. If the result data does not require encryption, then the processor outputs the result data to the client 110, in operation 340, and the method operations end. Alternatively, if, in operation 335, the result data required encryption, then in operation 345, the processor outputs the result data to the crypto-processor via the peripheral bus system 208.
In operation 350, the crypto processor 220 encrypts the result data. In operation 355, the crypto processor 220 outputs the encrypted result data to the processor 202 via the peripheral bus system 208. In operation 360, the processor outputs the encrypted result data to the client 110 and the method operations end.
Transferring the data to be encrypted, decrypted or processed between the crypto processor 220 and the processor 202 is very slow. Further, the slower processing speed of the crypto processor 220 also limits the rate at which the data is encrypted or decrypted. Further, if a large volume of data such as streaming data (e.g., streaming audio, streaming video, etc.) is being encrypted and/or decrypted then the rate the server 102 can serve the streaming data is limited by the rate at which the streaming data can be encrypted and/or decrypted. Further still, the multiple transfers of the streaming data between the crypto processor 220 and the processor 202 can dominate the usage of the peripheral bus system 208 and the I/O systems inside the crypto processor 220 and the processor 202, thereby limiting further the ability of the processor 202 to perform any functions other than transferring data to and from the crypto processor 220.
As speed of execution is nearly always a paramount consideration, then the crypto processors 220 need to perform as fast as possible. By way of example a typical MD5 hash algorithm iteration may require four clock cycles (e.g., a 4-stage pipeline for the processor performing the hash computation) and a complete computation of the MD5 hash algorithm requires 64 iterations of the iterative portion of the MD5 hash algorithm. As a result, 256 clock cycles may be required to complete the computation of the MD5 hash algorithm. The MD5 hash algorithm requires only 64 iterations, rather than the 80 iterations for a typical SHA-1. However, the SHA-1 iterations need only half as many clock cycles as MD5, thus due to the additional clock cycles required for each iteration, the MD5 requires about 60% greater clock cycles than the SHA-1 (i.e., 256 clocks cycles for the MD5 vs. 160 clock cycles for the SHA-1).
In view of the foregoing, there is a need for an improved crypto arithmetic logic unit that can substantially reduce the processing time for an MD5 hash algorithm.