A hash table is a data structure that can map keys to values. A hash function is applied on the key, to compute an index to an entry in the hash table, where the corresponding value (data to be retrieved) can be found.
The hash function further depends on the hash table's size, thus, given a key and a table size, the index is computed by: index=hash-function (key, table-size), where the index is between one and the number of entries in the table.
A good hash function should spread the computed indices uniformly across the table. However, in practice the hash function sometimes produces the same index to different keys, a state which is called hash collision.
The chance for collisions is determined not only by the function quality, but also by the ratio between the number of entries (x) in the table and the number of keys (y) needed to be supported by the hash table.
The chance for collisions for a good function is expressed by a formula: (1−1/y)**x.
A larger ratio (a larger number of entries comparing to the number of keys) decreases the amount of collisions. In a case where the number of entries is equal to the number of keys, y=x, the formula results (1−1/x)**x=1/e (e=2.7), i.e., around 1/3 to 1/2 of the keys are subject to collision.
All collision resolution strategies require that the keys be stored in the table entries together with the associated values. The keys are required to be stored in the entries, so as to check whether a produced index points to an entry with the requested key.
Popular collision resolution strategies employ linked lists for storing the collided keys and corresponding values. For example, each entry in the hash table may point to a head of a linked list. The linked list requires that in addition to each key and value pair, a pointer is also stored, for referring to the next key-value pair with the same produced index.
Increasing the number of entries of the hash table reduces the chances for collisions, but consumes memory resources for storing the enlarged table. On the other hand, if the number of entries is reduced, still memory is consumed by the extra pointers required to manage the linked lists of the collided keys.
There is a growing need to provide an effective hash-based solution to writing and reading data entities.