This application relates to hashing techniques used by computer systems and, more particularly, to hashing techniques employed in computer systems that may include multicore and/or multithread processor elements.
Hashing is a technique commonly used in sequential set implementations to ensure that these method calls take constant time on the average. A hash set, also known as a hash table, is an efficient way to implement sets of items. A hash set is typically implemented as an array, called a table. Each table element is a reference to one or more items. A hash function maps items to integers so that distinct items almost always map to distinct values. To add, remove, or test an item for membership, apply the hash function to the item, modulo the table size, to identify the table entry associated with that item. In other words, hash the item.
In some hash-based set algorithms, each table element refers to a single item, an approach known as open addressing. In others, each table element refers to a set of items, traditionally called a bucket; this approach is known as closed addressing.
Any hash set algorithm must deal with collisions. Collisions are when two distinct items hash to the same table entry. Open addressing algorithms typically resolve collisions by applying alternative hash functions to test alternative table elements. Closed addressing algorithms place colliding items in the same bucket, until that bucket becomes too full. In both types of algorithms, it is sometimes necessary to resize the table. In open addressing algorithms, the table may become too full to find alternative table entries, and in closed addressing algorithms, buckets may become too large to search efficiently. Open addressing schemes have the potential performance advantage of involving one level less of indirection in accessing an item in the table, since the item is placed directly in an array, not in a linked list implementing a bucket.
Cuckoo hashing is a sequential hashing algorithm in which a newly-added item displaces any earlier item occupying the same slot. For a hash set of size N=2k, where k is the number of items in a table (array), two arrays (e.g., table[0] and table[1]) that form a table and two independent hash functions (h0 and h1) are used to map the set of possible keys to entries in the arrays. To test whether a value, denoted as X, is contained in the set, the function contains(X) is used. This function tests whether either table[0][h0(X)] or table[1][h1(X)] results in X. Similarly, the remove function, remove(X), checks whether X is in either array (table[0][h0(X)] or table[1][h1(X)], i.e., by using the contains(X) function) and then removes X if it is found.
An example of a sequential cuckoo hashing add( ) method is shown below:
1  public boolean add(T x) {2   if ( contains (x)) {3     return false ;4   }5   for ( int i = 0; i < LIMIT; i++) {6     if ((x = swap(hash0(x), x)) == null) {7       return true;8     } else if ((x = swap(hash1(x), x)) == null) {9       return true;10    }11  }12  resize ( );13  add(x);14 }The add(x) method successively “kicks out” conflicting items until every key has a slot. To add x, swap x with y, the current occupant of table [0] [h0(x)] (Line 6). If the prior value y was null, the method is done (Line 7). Otherwise, swap the newly nest-less value y for the current occupant of table [1][h1(y)] in the same way (Line 8). As before, if the prior value was null, the method is done. Otherwise, continue swapping entries (alternating tables) until an empty slot is found.
An empty slot might not be found, either because the table is full, or because the sequence of displacements forms a cycle. An upper limit is therefore needed on the number of successive displacements that may be undertaken (Line 5). When this limit is exceeded, resize the hash table, choose new hash functions (Line 12), and start over (Line 13).
Sequential cuckoo hashing is attractive for its simplicity. It provides constant time contains( ) and remove( ) methods, and it may be shown that over time, the average number of displacements caused by each add( ) call will be constant. Experimental evidence shows that sequential cuckoo hashing works well in practice.