Exact match search tables are frequently used in data networking to resolve destination ports or forwarding addresses. Common search keys are destination Ethernet Media Access Control (MAC) addresses for Ethernet switching, destination Internet Protocol (IP) addresses for Address Resolution Protocol (ARP) table resolution, or Multiprotocol Label Switching (MPLS) labels for label switch routing. High bandwidth devices used in these applications require search tables with high search rates and deterministic low latency.
Existing hash based search tables suffer from reduced efficiencies in the hash algorithms where the probability of collision becomes unacceptably high as the table becomes full. Radix or Patricia tree based implementations, which are also common in networking devices, have unpredictable latency that grows with table size leading to uncertainty in the achievable throughput.
Probability of Hash Collision
A typical well balanced hash function can be measured in terms of the probability that a new entry will collide with an entry already in the table. FIG. 1 is a graph 10 illustrating the probability Q(n) of collision of item n with existing entries in a simple hash. The probability of collision for a new entry is shown in FIG. 1 for a table of size 32768 at different fill levels. As can be seen in the graph 10, the probability of collision will exceed 60% as the table becomes full. This probability distribution also assumes that any collision seen in previous additions to the table have been resolved. This is fine for software applications which typically run at a low percentage fill, sometimes as low as 2%, but for a chip for networking, this is unacceptable.
In applications where removal of an existing entry in the case of collision is not permitted, such as in MPLS label switch routing tables, all collisions must be resolved. If a typical approach of building a linked list of entries at the colliding value of the hash value is employed, the latency of the search will no longer be deterministic, since it cannot be completed in a single look-up cycle, which refers to one reading of the table. Therefore, in such a case the packet processing rate will not be able to be guaranteed. Using this case of no unresolved collisions and where only a single value may be stored per hash result, a much worse probability of collision results.
The probability of collision can be modeled in the same way as the birthday problem in probability theory, and is shown in FIG. 2. A graph 20 in FIG. 2 illustrates the probability P(n) of collision for a 1-way hash with 32768 buckets. As can be seen in the graph 20, the probability of a collision in the dataset exceeds 90% at fewer than 400 entries in a 32768 entry hash table. Clearly, a simple 1-way hash function is unacceptable for arbitrary datasets where no collisions are permitted.
Moving to a multiple way hash function where multiple hash functions are executed in parallel helps solve this problem. A common implementation is a cuckoo hash where n parallel hash functions are employed. Upon collision in one hash function, the entry is added using the other hash function. If all hash functions have a collision, then one of the existing entries is removed to make space for the new entry and the removed entry is then reinserted using the multiple hash functions.
FIG. 3 is a graph 30 illustrating the probability of collision for the nth entry in a 2-way hash when the fill levels of both hash tables are similar, assuming independent hash functions. By extending this to many hash functions, the probability of collision can be reduced to a low value. However, each hash function increases the size of implementation and increases the number of memory accesses that must be performed in searching of the hash tables. Further, the latency of performing insertions with a cuckoo algorithm must be hidden so as not to impair the add rate of the overall hash table, as it is generally unknown how many times to iterate the cuckoo algorithm before an insertion (add) does not generate an overflow (collision on all hash algorithms).
High performance search tables enabling collision-less exact matching of large sets of data are required for packet networking applications such as MPLS label switch routing, IP address resolution tables for MAC destination addresses, or for better performance in Ethernet bridging. Hash based solutions provide low latency and high throughput. However, existing hash solutions suffer from collisions that are data set dependent and do not permit their use for MPLS Label Switch Router (LSR) applications.
Improvements in search tables and related hash functions are desirable.