Since the beginning of the computing industry, efficient data storage and retrieval have been key concerns. As computing becomes more integrated into people's lives the amount of data gathered and stored is growing exponentially. Furthermore, this data is being gathered and used by ever smaller devices with limited storage space and processing power.
One data storage structure that computer scientists have developed to speed up data storage and retrieval is hash tables. Hash tables are data structures with multiple “slots” that can store data, each with an index value. A data item or a key identifying the data item can be “hashed” with a hash function. A hash function can take data of variable sizes or types and can return a consistently sized number. This number can be used to select an index on a hash table index. If the number is larger than the highest slot index, the modulus of the number can be calculated with respect to the size of the hash table to select the hash table index. When a retrieval operation for this data item needs to be performed, the data item can quickly be located by performing similar operations on the data item key to locate the index at which the data item was stored.
In some cases, a second data item key can hash to the same hash table index as a key for a data item already stored in the hash table. This can result in a collision. A “collision,” as used herein, occurs (A) in a storage operation when another data item is already stored at an index computed for a data item or (B) in a retrieval operation when a data item being retrieved is not stored at an index computed for the data item. Algorithms for collision resolution have been developed such as separate chaining, Robin Hood hashing, and open addressing with various probing sequences such as linear probing and quadratic probing. These collisions occur because multiple data items may have keys that hash to the same or similar values (e.g. are clustered).
Although programs running on computing devices can store data to a storage device in a next available slot, when that data is retrieved the entire storage device may have to be searched to locate the data, significantly decreasing performance.
The techniques introduced here may be better understood by referring to the following Detailed Description in conjunction with the accompanying drawings, in which like reference numerals indicate identical or functionally similar elements.