Hashing functions (or hash functions) are widely used in cryptography because they can reduce the length of a digital data word, for example to reproduce an electronic signature or certificate to guarantee the integrity and authenticity of a message. Hashing functions can also form essential parts of a wide range of protocols, such as cryptographic entity authentication protocols or the SSL/TLS protocol.
As a general rule, a hashing function H is a compression function that converts a large set of characters (referred to as the message or input) of any length into a smaller set of characters of fixed length (called the output, message digest or hash). Because the hashing function is a “one way” function it is impossible to retrieve the original set from the message digest.
One example of this kind of hashing function is described by R. C. Merkle in the paper “One-way Hash Functions and DES” (CRYPTO, Springer-Verlag 1989), where the message digest is calculated by a chaining principle.
To be more precise, a function is a cryptographic hashing function if it satisfies the following three conditions: resistance to attacks “on the first pre-image” (or antecedent), resistance to attacks “on the second pre-image”, and resistance to “collisions”.
Resistance to an attack on the first pre-image makes it very difficult (i.e. technically virtually impossible) to recover the content of a message or input x from a given message digest or output y. In other words, it is technically virtually impossible (either at the algorithm level or the hardware level) to find an input x such that H(x)=y.
Resistance to an attack on the second pre-image makes it very difficult to produce from a message x and its message digest y another message x′ that gives the same message digest y. In other words, given an input-output pair (x, y) where H(x)=y, it is very difficult to find an input x′ where x′≠x such that H(x′)=y.
Resistance to collisions makes it very difficult to find two random messages that give the same message digest. In other words, it is very difficult to find any two inputs x and x′ such that H(x)=H(x′).
For example, for a message digest having a length of n bits, there are 2n such message digests. Moreover, the “birthdays” theorem indicates that 2n/2 attempts are sufficient to find a collision by chance.
Thus a cryptographic hashing function is of good quality if the minimum number of attempts necessary to solve the above three problems is of the order of 2n, 2n and 2n/2, respectively.
At present, the hashing functions in widespread use are constructed on a one-off basis or are based on using DES (data encryption standard) block encryption. However, their performance and their message digest lengths are not well suited to current uses in cryptography.
The hashing functions most widely used at present are of the MD5 (Message Digest Algorithm) type and the SHA-1 (Secure Hash Standard) type (see http://www.ietf.org/rfc/rfc3174.txt).
However, X. Wang and H. Yu have shown, in their papers “How to Break MD5 and Other Hash Functions” (May 2005) and “Finding Collisions in the Full SHA-1” (August 2005), that these functions have weaknesses enabling faster calculation of collisions than pure chance. Similarly, other types of hashing functions that are also part of the MD5 family are also vulnerable.
Attempts have been made to construct other hashing functions relying, through certain proofs, on the difficulty of solving certain classes of problems in the arithmetic of large numbers. The major drawback of these functions is the lack of efficiency inherent to the arithmetic of large numbers. Moreover, they guarantee resistance only to collisions and therefore do not satisfy the conditions regarding resistance to attacks on the first pre-image and the second pre-image.