Hash functions, can be used for many purposes: e.g. to index data in hash tables, for fingerprinting, for load balancing distribution, to lookup data in databases, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption. For example, in a load balancing distribution application, the hash algorithm may use an Internet Protocol (IP) address of the client, or a Media Access Control (MAC) address of the client, or the value of an HTTP header, etc. as the basis for server selection. In such applications, the same client may be served by the same server even when the list of available servers is modified during the client's session. Such a property may also make this algorithm useful for applications that require the storage of server-side state information, such as cookies.
The term “consistent hashing” refers to a way of distributing requests among a changing population of Web servers. Each slot is then represented by a node in a distributed system. The addition (joins) and removal (leaves/failures) of nodes requires items to be re-shuffled when the number of slots/nodes change. The hash function may not preserve structure. Ideally, for each input data, the possibility for acquiring any of the possible output data should be equal. Any inequalities in the frequency distribution of the input data is transformed into a uniform distribution of output data.
There may, however, be problems due to accidents as well as on purpose. Accidentally, the users may consist of different groups, requesting access to the resources to different degrees. If these groups are unfortunately balanced, the users that are guided to a certain resource by the hash function may request access to the resources to a larger extent than the other users. This certain resource is then subject to a larger load than the other resources, resulting in a biased load balance among the resources.
On purpose, so-called “hash attacks” may occur, which are intended to cause a biased load balance among the resources. The hash attacks are generally made possible by the attackers having sufficient knowledge about the system and/or the attackers making use of information that is output from the system comprising the resources. The attackers then see to it that each request for resources, when passing the hash function, is guided to one and the same resource. This resource is then subject to an unusually high load, and then functions more or less inefficiently, which may result in a so called “denial of service”, where the resource does not accept any more users. This denial of service may affect the service efficiency of the whole system.
A cryptographic hash function is a hash function, i.e. an algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the (cryptographic) hash value, such that an (accidental or intentional) change to the data will (with very high probability) change the hash value. The data to be encoded are often called the “message,” and the hash value is sometimes called the message digest or “digest.”
Cryptographic hash functions have many information security applications, notably in digital signatures, message authentication codes (MACs), and other forms of authentication. They can also be used as ordinary hash functions, to index data in hash tables, for fingerprinting, to detect duplicate data or uniquely identify files, and as checksums to detect accidental data corruption. In information security contexts, cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values, even though all these terms stand for functions with rather different properties and purposes.
One of the most famous cryptographic hash functions is the MD5 (Message-Digest algorithm 5) algorithm developed by Ronald Rivest. Other common algorithms are SHA-1 (Secure Hash Algorithm 1) as well as variants SHA-2 and SHA-3 published by the National Institute of Standards and Technology (NIST) as a U.S. Federal Information Processing Standard (FIPS).
If a cryptographic hash function is based on some mathematical function, it may still become susceptible to denial of service attacks, in that under some circumstances it may fall subject to one of a so-called “Zipf's law,” “power law,” or “Pareto distribution,” wherein some particular resource is subject to an unusually high load. This property or similar properties may be maliciously exploited to cause a biased load balance among the resources resulting in a denial of service.
Two important tradeoffs in hash functions, for one class of hashing to perform lookups, are: (1) complexity versus speed of calculation—too simple and the hash is easily broken, too complex and the hash takes too long to calculate; and (2) digest distribution and avalanche properties—a single bit change in the input should cause n bits to change in the hash digest output value, the strongest being where n is about half the size of the hash digest output.
Some have proposed processors (e.g. U.S. Pat. No. 8,255,703) or coprocessors (e.g. U.S. Pat. No. 7,240,203) capable of executing an entire secure hashing algorithm. One drawback to such an approach is that it is not easily fit into a standard execution pipeline of a modern microprocessor without making special considerations for such things as the handling of interrupts specially, or the concurrent superscalar execution of other instructions. Another mismatch with standard execution pipelines is the latency required for executing an entire secure hashing algorithm.
Modern processors often include instructions to provide operations that are computationally intensive, but offer a high level of data parallelism that can be exploited through an efficient implementation using various data storage devices, such as for example, single instruction multiple data (SIMD) vector registers. The central processing unit (CPU) may then provide parallel hardware to support processing vectors. A vector is a data structure that holds a number of consecutive data elements. A vector register of size M may contain N vector elements of size O, where N=M/O. For instance, a 64-byte vector register may be partitioned into (a) 64 vector elements, with each element holding a data item that occupies 1 byte, (b) 32 vector elements to hold data items that occupy 2 bytes (or one “word”) each, (c) 16 vector elements to hold data items that occupy 4 bytes (or one “doubleword”) each, or (d) 8 vector elements to hold data items that occupy 8 bytes (or one “quadword”) each. The nature of the parallelism in SIMD vector registers could be well suited for the handling of secure hashing algorithms.
To date, potential solutions to such complexities, mismatches, performance limiting issues, and other bottlenecks have not been adequately explored.