Basic hashing works by computing a hash index I=H(K), I ε SI, where K ε SK is the key and H( ) is a hash function that maps elements of key space SK into a smaller index space SI. I is used to index a hash table, which may either store one or more keys which hash to the same index directly, or a pointer to the key storage.
Hashing is frequently used as a mechanism to perform exact match searches of fixed- or variable-length keys. These searches may be performed to extract data from a results database that is associated with each stored key: e.g., Quality of Service (QoS) processing information for a packet flow which is defined by a key composed of certain packet header values. While hashing has good (O(1)) average search time, it has a worst case search time of O(N) for N keys, due to the possibility of hash collisions.
FIG. 1 is a graph 100 illustrating the probability of hash collision P for a new key inserted into a hash table as a function of the table's load, defined as tile ratio of already inserted keys N to the number of bins B in the hash table. Here, simple uniform hashing is assumed, that being where any key will hash into any bin with equal probability. In FIG. 1, the results are plotted for B ranging from 100 to 10000000, and it is observed that the resulting curve is insensitive to the absolute value of B. Note that P is approximately proportional to α for small values of α. The collision probability P at load α is equivalent to the expected fraction of occupied hash bins at that load. This is also equal to the expected fraction of keys that collide with another key at that load. Hash collisions can be resolved through a variety of mechanisms, including chaining, double hashing, open addressing, coalesced hashing, 2-choice hashing, and 2-left hashing. Disadvantageously, none of these mechanisms offer a deterministic search time for every key.
An arbitrarily low ratio of colliding entries can only be achieved by operating at a low load; that is by making B large relative to N. However, this results in a waste of memory space.
Exact match searches for fixed- or variable-length keys in databases is a common problem in computer science, especially in the context of packet forwarding e.g., Ethernet Media Access Control (MAC) lookup, and Internet Protocol (IP) 6-tuple flow lookup. Often in these applications, tens of millions or hundreds of millions of searches must be completed per second. In the context of packet forwarding, the database key might be anywhere from 16 to 48 bytes in size. Conventional solutions often involve sophisticated memory technology, such as the use of binary or ternary content addressable memory (CAMs), or combinations of well-known hashing techniques with memory technology, to retrieve those keys which are not conveniently resolved by the hashing technique.
Conventional hash-based solutions cannot provide deterministic search time due to the need to resolve hash collisions, which in the worst case can be O(N) for N keys, whereas solutions which depend on sophisticated memory technology are typically expensive, have low density, and have high power consumption.
The concept of using multiple hash tables is known in the art. For example, it is a basic component of the well-known 2-choice hashing and 2-left hashing methods. The method described in U.S. Pat. No. 5,920,900 to N. Poole, et al., while it uses multiple hash tables for collision resolution, does not bound every search to at most two hash table lookups.
What is desired is a solution that provides deterministic search time, with bounded memory.