In cryptography, a cryptographic hash function is a transformation that takes an input and returns a fixed-size string, which is called the hash value. Hash functions with this property are used for a variety of computational purposes, including cryptography. The hash value is a concise representation of the longer message or document from which it was computed. The message digest is a sort of “digital fingerprint” of the larger document. Cryptographic hash functions are used to do message integrity checks and digital signatures in various information security applications, such as authentication and message integrity.
A hash function takes a string (or ‘message’) of any length as input and produces a fixed length string as output, sometimes termed a message digest or a digital fingerprint. A hash value (also called a “digest” or a “checksum”) is a kind of “signature” for a stream of data that represents the contents. One analogy that explains the role of the hash function would be the “tamper-evident” seals used on an application package.
When two messages have the same hash value, this is known as a collision. A good hashing functionality minimizes collisions for a given set of likely data inputs. There is a need for a means for designing and analyzing a hash function that could be used for digital signature technology.
In various standards and applications, MD (Message Digest) and SHA (Secure Hash Algorithm) versions have been consistently evolved, implemented and used.
The MD1, MD2, MD3, MD4, MD5 (Message-Digest) are a series of structured functions; widely used, cryptographic hash functions with a 128-bit hash value.
The SHA (Secure Hash Algorithm) versions (SHA-160, SHA-224, SHA-256, SHA-384, SHA-512 bits) are five cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. Hash functions compute a fixed-length digital representation (known as a message digest) of an input data sequence (the message) of any length. They are called “secure” when (in the words of the standard), “it is computationally infeasible to:                1. find a message that corresponds to a given message digest, or        2. find two different messages that produce the same message digest.Any change to a message will, with a very high probability, result in a different message digest.”        
The recent advances in cryptanalysis of hash functions have been spectacular, and the collision attacks on MD5 and SHA-1 are of particular importance since these are so widely deployed.
MD5 collisions can be easily found. The analytical attack was reported to take one hour on an IBM p690 cluster. MD5 has been known to be weak for a long time but it is still used with no catastrophic consequences.
SHA-1 is also widely deployed but has collision-resistance problems. SHA-1 collisions are found if the number of rounds is reduced from 80 to about 50. In theory, collisions in SHA-1 can be found in 269 attempts or hash evaluations. But this is only for a reduced-round version, and even then it is too expensive. So far no one has found collisions for SHA-1 using all rounds.
SHA-1 is derived from SHA-0, and SHA-256 is derived from SHA-1. These functionalities depend on intuition-based design that failed twice for SHA-0 and SHA-1. Given the attacks on the collision resistance of SHA-1 and the close relationship between the designs of SHA-1 and SHA-256, there is not much confidence on the collision resistance of SHA-256. Evaluation of SHA-256 is also difficult because it not known which attacks it was designed to protect against, or the safety margins assumed.
Thus, there is doubt over the design philosophy of the MD/SHA-family. Since the current class of functions is flawed, one option to counter this threat is to upgrade to a stronger hash function. Alternatively message pre-processing is a method that can be used for the above purpose. This technique can be combined with MD5 or SHA-1 so that applications are no longer vulnerable to the known collision attacks. The pre-processing function resists collision attacks in Hash functions. In this method, the given message (input) is pre-processed before being hashed. The rationale behind pre-processing is that the given message is made more random before being passed into the hash function. This reduces the redundancy in the input data, thus leading to a lower probability of finding a collision. This method is called Message Pre-processing.
A hash function is a one-way function that maps an arbitrary length message into a fixed length sequence of bits. There are two basic requirements for a secure hash function, namely, the collision resistance property that is, it should be hard to find two different messages with the same hash result and the pre-image resistance property, which means, given a hash value, it should be hard to find a message that would generate that hash value.
The definitions designated are:                The hash value of a message m as H(m).        Collision: find two distinct messages m, m′ such that H(m)=H(m′).        1st pre-image: Given a hash value HV, find m such that H(m)=HV.        2nd pre-image: Given a message m, find another message m′ such that H(m′)=H(m).        In a hash function of length n:        A brute force attempt to find a collision should require at least 2n/2 hash operations.        Brute force attempts to find 1st and 2nd pre-images should require at most 2n hash operations.        
A cryptographic hash function is a function with certain additional security properties to make it suitable for information security applications such as authentication and message integrity.
In 1990, Ronald Rivest proposed the MD4 Hash function (RFC 1320). Hans Dobertin published collision attacks on MD4 in 1996.
In 1991, Ronald Rivest improved MD4 and called it MD5 Hash function. The output of MD5 is 128 bits. Later, in the year of 2004, Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu published collisions of full MD5.
The SHA (Secure Hash Algorithm) versions are five cryptographic hash functions designed by the National Security0 Agency (NSA) and published by the NIST as a U.S. Federal Information Processing Standard. SHA consists of five functionalities: SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512. In the year of 1992, NIST published SHS (Secure Hash Standard) called now SHA-0. Joux, Carribault, Lemuet, and Jalby published collisions for full SHA-0 in 2004. In February 2005, an attack by Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu was announced. The attacks can find collisions in the full version of SHA-1, requiring fewer than 269 operations.
The recent attacks on MD-5, SHA-0 and SHA-1 by Wang et al has given a huge impetus to research in designing practical cryptographic hash functions as well as cryptanalysis of existing functions. Hitachi Ltd. has patented special purpose hash functions using collision free and one-way properties. IBM has published SHA-IME (improved message expansion) to avoid differential attacks in SHA. Microsoft R&D has performed cryptanalysis of hash functions using Boolean Satisfiability (SAT) solvers. Tata Consultancy Services (TCS) has a patent pending regarding a cryptographic research work, which introduces a message pre processing function (MP) that is bijective and is used to reduce hash collisions.