Hash tables are widely used in a variety of network applications since they result in constant-time primitive operations such as query, insertion and deletion operations. Hash tables are widely used to store memory addressing information regarding data elements stored in a high capacity storage systems, for example. However, collisions may frequently occur when the table load increases. Newly inserted elements that collide with the existing elements are inserted into additional slots, leading to an increase in the length of the probe sequence used during the query. As a consequence, the cost of primitive operations rises causing degraded performance. While well-known collision resolution policies maintain average performance despite situations having high loads and increased collisions, their performance nevertheless becomes highly non-deterministic.
In modern hardware devices, such as network processors, system performance degrades sharply due to the non-determinism of many hashing techniques. The primary reason is that, in such sophisticated devices, multiple threads are coordinated to accelerate hash operations and therefore synchronization is required because the processing order is critical. Since such synchronization mechanisms ensure that a collection of requests are handled in the same order in which they arrive, the slowest thread unfortunately determines the overall system performance. As the number of threads each having a non-deterministic performance increases, the slowest thread tends to become much slower and the system performance thereby degrades sharply. For such hardware devices, it is critical to maintain a high-degree of determinism through effective collision resolution.
Also, false positives may occur in a multiple-segment hashing system, such as Peacock hashing or FHT. False positives may be classified into two categories. For an element already present in one of the hash tables, the first category of false positive occurs when the element is wrongly reported in some other table(s) by the on-chip Bloom filter. For an element not present in any hash table, the second category of false positive occurs when the element is wrongly reported in one or more hash tables. Peacock hashing and FHT do not discriminate between these two categories of false positives. However, the first category of false positive is of great importance in both theory and practice. A Bloom filter, either a basic one or advanced variants, may never avoid the first category of false positive because it is an approximate summary only in order to save memory. To reduce false positives below 1%, the costly remedy is a 10-bit on-chip memory for every table element. Furthermore, in a high-speed network device such as a router, millions of packets go through every second and the IP address lookup function needs to be performed at wire speed. Even with 1% false positives, tens of thousands of packets are incorrectly matched to routing table entries using a multiple-segment hashing system. This leads to costly and unnecessary probing which also degrades system performance. With increasing network traffic, the problem tends to be worse. From this viewpoint, it is critical to reduce or even remove the first category of false positives.
Many network applications do not need to handle the second category of false positives. For example, IP address lookup forwards incoming packets using routing tables. In a routing table, a special rule, usually with lowest priority, exists to match any packet which does not match other rules. Similarly, packet classification algorithms need to match incoming packets using packet classifiers. The lowest-priority rule in a classifier usually matches any packet since the five fields in the rule are wildcards. Such applications include firewall, access control list operations, etc. Some network applications may temporarily allow the second category of false positive. However dynamic incremental update is used to add the unknown properties in any mismatched item to the database and hence such item may never be skipped a next time. An example is Intrusion Detection System, IDS. If a target packet contains some signature that does not match any rule in the current IDS library, the signature is retrieved and added into IDS library to match packets with the same class of signature.