1. Field of the Invention
Embodiments of this invention are related to hash functions.
2. Background Art
Hash functions are used in numerous applications, particularly in technologies related to network communications and security in order to achieve faster table lookups. A hash function maps a first data set, which is typically larger, to a second data set. The input to the hash function may be referred to as a “key,” and the output may be referred to as a “hash value,” “hash code” or “hash.”
The hash value represents one or more entries in an array (also referred to as a “hash bucket”) that corresponds to a particular key. A hash table is a collection of hash buckets. Each hash bucket can store any number of data items for lookup. In order to lookup a data item corresponding to a key, the key is processed by a hash function to determine a corresponding hash value. After the hash value is determined, the hash bucket within the hash table is accessed. Finding the desired data item within the hash bucket may require one or more operations to traverse through the one or more data items stored in the hash bucket.
It is strongly desired that a hash function evenly distributes the set of input keys over all the hash buckets in a hash table. When keys are distributed relatively evenly over the hash table, the expected lookup time for a key is a function of the loading factor (e.g., average number of data items stored per hash bucket) of the hash table. However, no single hash function can efficiently distribute all key sets evenly over the hash buckets. Also, any single hash function can be “defeated” with relative ease by submitting a set of keys such that excessive collisions are caused on a subset of the buckets. Due to these, and other, practical weaknesses of single hash functions, some applications dynamically select from among a plurality of related hash functions, e.g. a hash function family.
A “universal hash function” selects a hash function from among a family of such functions. The hash function may be selected randomly from the family such that predetermined mathematical properties are satisfied. Universal hash functions ensure a low number of collisions even when the keys are selected by an adversary. Universal hash functions are frequently used in areas such as implementation of hash tables, randomized algorithms, and cryptography.
Conventionally, hash functions are implemented in hardware as well as in software. Hardware-implemented hash functions can, in general, support substantially higher throughput rates than software-implemented hash functions. Exemplary hash functions used in hardware implementations include functions ranging from simple polynomial division of a key to more complex cryptographic hashes (e.g., MD4, MD5, SHA-1, and SHA-2). Polynomial division provides for high throughput and is very efficient in hardware, but has poor characteristics as a hashing function. Cryptographic hashes are typically expensive to implement in hardware and can be challenging to use in high throughput applications. Moreover, the cost of implementing cryptographic hash functions in hardware may be wasteful in lookup type applications, which are common in network and communications equipment.
However, hardware-implemented hash functions are a requirement for numerous applications, such as, for example, hashing implemented in network equipment and other communications equipment. What is needed therefore is that, in addition to the efficiency of implementation in hardware, such hash functions should also be difficult to be defeated by adversaries.