In network elements a media access control (MAC) forwarding data structure (also referred to as a MAC forwarding table or MAC forwarding database) is used to determine how to forward an incoming packet. Upon receiving a packet, the network element uses the destination MAC address (MAC) associated with that packet to lookup a bridge interface from the MAC forwarding data structure. The bridge interface indicates which network interface (I/F) the network element should use to forward the packet. One such implementation of a MAC forwarding data structure uses a hash table to store key-value pairs in which the key is the MAC associated with the incoming packet and the value is I/F to forward packets for that MAC. As packets come in, an I/F is retrieved from the MAC forwarding data structure for the associated MAC. If, the MAC forwarding data structure does not have an entry for that MAC then the network element can learn which bridge interface should be used for a given MAC and an entry can be created. This MAC-I/F pair is then inserted into the MAC forwarding data structure. If the network element handles 100 gigabits per second at minimum sized packets it handles approximately 150 million packets per second and the number of lookups is 150 million packets per second since the MAC forwarding data structure is consulted on every packet. However, typical rates for learning MAC-I/F associations (insertions) or deleting old associations are in the neighborhood of 100,000 per second. Thus, lookups occur at a rate of about 1000 times as often as insertions or deletions.
A hash table is a lookup data structure used to store key-value pairs that are indexed using the key. Hash tables strive to provide lookup and storage algorithms that always carry out with the same execution cost; this is referred to in big-O notation as O(1). Big-O, O(x), notation is used to denote cost of operations where x denotes the dominant factor that determines the cost. For example, if the cost of the lookup operation is A*n+B where n is the number of entries in the data structure then the lookup cost can be expressed as O(n). O(1) indicates a constant cost of the operation. Hash tables are generally chosen when the number of possible keys is much greater than the number of key-value pairs expected to be stored at any given time. In the case where the number of key-value pairs is of the same order of magnitude as the number of possible keys than a simple array implementation indexed by the key provides O(1) lookup and O(1) storage cost.
A hash table is implemented along the following lines. A hash function takes a key and generates a hash value (also referred to as hash index). In general, a hash function is a many-to-one function in the sense that the hash function will generate the same hash index for many keys. In this way, the hash function maps a larger key space into a smaller hash index space. The choice of which hash function to use is driven by how well the keys are distributed across the different hash indexes. The hash index is then used to index an array of hash buckets, each storing a key-value pair or linking to another data structure. Since the hash function is a many-to-one function, multiple key-value pairs (e.g. key1-value1 and key2-value2) can map to the same hash index and the same corresponding hash bucket. This is termed a hash collision.
Hash collision resolution refers to how the hash collisions are handled when they occur. With hash collisions, it is no longer sufficient to just compute the hash index from a given key. Instead, the further work must be done to differentiate between the multiple keys that can hash into the same bucket.
One class of hash collision resolution techniques, termed open addressing, involves checking alternate locations in the hash table until an empty slot in the array is found. In open addressing, the first tier hash table buckets do not contain lower tier hash tables. Each of the individual open addressing schemes differs in how the series of alternate locations in the array are picked for probing. However, in all such schemes the number of additional probes is not deterministic and therefore the lookup cost is not O(1).
Another class of hash tables involves constructing secondary data structures (such as a linked list, height balanced tree, or even another hash table) that hang from each hash bucket in the first tier hash table. The colliding key-value pairs are placed into the secondary data structure according to the algorithms associated with that data structure. In the case of a link list, the secondary data structure must be traversed until the given key is found and since the key-value pair may be first in the linked list or last in the linked list the total lookup cost is non-deterministic. In general, the cost of storage and lookup depends on the secondary data structure and therefore the associated costs are not necessarily O(1).
Most hash table implementations try to optimize the cost for all hash table operations: 1) insertion of a new key-value pair, 2) lookup of a key-value pair, and 3) deletion of a key-value pair. However such generic solutions are not well suited for use in a networking application which is lookup intensive.