Hash tables are widely used, for example, to increase speed and accuracy during storage, access, or other manipulation of data. For example, such hash tables and related techniques may be used to quickly identify and access desired data within a very large database. In other examples, hash tables and related techniques may be used during the routing of packets within a communications network, in order to identify a destination of each packet within the network.
In conventional hash tables, a hash function is used to convert a potentially-variable length piece of data into a piece of data having a fixed size at a specified location within the resulting hash table. The variable length or other input data may be referred to as the key, and the hash function may convert the key into a hash index value within the hash table, at which a desired data value may be stored, accessed, or otherwise manipulated.
In a specific example, the hash table may be used to associate individual names with corresponding telephone numbers. In such an example, the individual names may represent the key in the above description, which may be converted by the hash function into a hash index value at which the telephone number (i.e., the data value) may be stored. The resulting hash table may store thousands, millions, or more data records associating names and telephone numbers. Nonetheless, the hash function in question may be chosen such that the index value corresponding to each input key is unique or nearly unique within the hash table. Consequently, at a later time when a user wishes to access the desired stored data (e.g., a desired telephone number), it is only necessary to re-execute the hash function with respect to the specified individual name, whereupon the unique or nearly unique corresponding hash index value may be located within the hash table, so that the corresponding telephone number may also be identified in association therewith.
In practice, existing hash functions may be unable to provide a unique relationship between each key and each stored data value. Instead, it may occur that the hash function sometimes maps two different keys to the same hash index value, and thus results in multiple stored values being associated with the same resulting hash index value.
In such cases, it is possible to associate multiple entries with each hash index value within the hash table. In the example above, for example, a given hash index value may be associated with two entries, so that even if the hash function relates two keys (individual names) with the particular hash index value, then the associated two entries of the hash index value may each be used to store a corresponding data value (e.g., telephone number). At a later time, when a user wishes to access the data value (telephone number) of one of the keys in question, the hash function may be used to relate the key (name) in question to the hash index value having two entries. In this case, it will be necessary to distinguish between the two entries in order to return the desired data to the user. Nonetheless, the hash table in this example provides a significant advantage over many other data access techniques, by quickly providing the two entries of the hash index value to the user. In this way, only these two entries need be considered to identify the desired data value, rather than, e.g., the entire data set.
In the terminology of the art, a hash index value having multiple entries as described above may be referred to as a “bucket,” because the hash index value may be considered to be a bucket of entries that may potentially be used to store data values associated with the corresponding hash index value. Although an example was described above in which the referenced bucket included only two entries associated with the hash index value in question, it may occur that a larger number of entries may be provided in association with a given bucket of a hash table. Nonetheless, however many entries are associated with each bucket, it may generally occur over time that as the hash table is filled with more and more data, eventually the hash function may associate more keys with a specific hash index value (bucket) than there are entries contained within the hash index value (bucket). In such a case, it may be impossible to store a subsequent data value within the bucket. Such a circumstance is referred to, for example, as a miss, or a collision. Consequently, a metric known as the “first miss utilization (FMU)” has been developed to describe an efficiency or other utility of a given hash table and associated hashing techniques. That is, the FMU refers to the first such miss that occurs during population or other access of the hash table in question. For example, the FMU may refer to a percentage of the hash table that may reliably be filled with data before a first such miss occurs.