Associative data structures are used in numerous computing and networking applications to store electronic data. For example, in network applications, an associative data structure may be used to store a table of information (data values) that includes data corresponding with various entities operating in a network. Such data values may be stored using keys, such as an address or a particular network entity. The data values are typically associated with the keys, or network addresses in the associative data structure. Such network addresses (keys) may take the form of Media Access Controller (MAC) addresses, Ethernet addresses, Internet Protocol (IP) addresses, or any other appropriate identifying information that may be used as a key to identify, or associate a particular data value with a respective network entity.
Such associative data structures may be “fully-associative” or “set-associative.” An example of a fully associative data structure is content addressable memories (CAMs). In a CAM, data values are indexed using a complete key, such as an entire MAC address, for example. Because CAM structures are fully associative, they may operate with one-hundred percent utilization. One-hundred percent utilization, in this context, means that a CAM structure with N entries will store the first N entries received by the CAM for entry without a “miss” or insertion failure occurring.
While CAMs may operate with one-hundred percent utilization, they are expensive to implement and complex to design. For instance, CAMs may consume a large amount of dynamic power and semiconductor area in integrated circuit applications and include complex circuitry for accessing their entries as compared to other types of associative data structures, such as set-associative data structures.
One example of a set-associative data structure is a hash table. In hash tables, data values are indexed based on the result of a hash function that may be applied to the corresponding keys for the data values. The hash function produces an index (which has fewer bits than the key) in a given set of indexes to a “bucket” of the hash table. Each bucket of the hash table includes a certain number of entries and each bucket generally has the same number of entries. Hash tables may be referred to as set-associative because each bucket may include a set of data values with keys that “hash” to the same index. The keys and data values may then be stored (and later read from) the bucket of the hash table corresponding with the index produced by applying the hash function to the key. Because the range of indices produced by a hash function is typically predetermined fixed (as compared to network addresses used as keys) the circuitry for accessing the entries of a hash table may be substantially less complex than corresponding circuitry in a CAM.
Because hash tables store keys and data values based on indexes that have fewer bits than the keys, if follows that multiple keys will hash to the same index. Accordingly, collisions may occur in a hash table. A collision occurs, for example, when an attempt is made to insert a data value in a hash table in a bucket that is full (i.e., all entries in the bucket have previously stored data values). Accordingly, unlike fully-associative data structures, hash tables do not operate with one-hundred percent utilization.
Two common measures of utilization efficiency for hash tables are first miss utilization (FMU) and address insert utilization (AIU). FMU is a measure of the number of entries that a hash table stores before the first collision or miss occurs (i.e., a data value fails to get stored in the hash table). For instance, if a hash table has N entries and F entries are stored in the hash table before the first miss occurs, the FMU of the hash table may be expressed as F/N. As an example, assuming random keys are inserted, a hash table with 2 k entries (2048 entries) in 256 buckets (i.e., with eight entries per bucket) may operate with a mean FMU of 34%. If other patterns of keys are used (e.g., non-random) the utilization efficiencies may vary as compared to those for random keys.
AIU is a measure of overall utilization of a hash table. For instance, in a hash table that has N entries, if N attempts are made to insert entries in the hash table and M of those entries are successfully stored in the hash table (i.e., N−M misses occur), the AIU for the hash table may be expressed as M/N. Using the same example as above (a 2 k hash table with 256 buckets and eight entries per bucket), such a hash table may operate with a mean AIU of 86% assuming random keys are inserted.
It will be appreciated that utilization numbers for hash tables may vary depending on a number of factors, such as the particular arrangement of the hash table and the hash function, among any number of other factors. The utilization numbers referenced above are given by way of example for purposes of illustration and comparison.
Depending on the particular embodiment, a hash table with an FMU on the order of 35% may not be sufficient for satisfactory operation of a system in which the hash table is used. Likewise, in other applications, a hash table with an AIU on the order of 85% may not be sufficient. Furthermore, using a fully-structure in place of a hash table in such situations may not be a cost effective solution due the design complexity and physical size of such data structures.