1. Field of the Invention
The present invention relates generally to an improved data processing system and in particular to a method and apparatus to locate data based on a key value. Still more particularly, the present invention relates to a computer implemented method, apparatus, and computer usable program code for an efficient hashing scheme.
2. Description of the Related Art
Many applications, such as databases, networking, and language models, require efficient lookup of a data item/record given some identifying key. Ideally, the efficient lookups should be fast, and the space used for storing the items/records should be as small as possible.
Hashing is a commonly used technique for providing access to data based on a key in constant expected time. Hashing is used in database systems for joins, aggregation, duplicate-elimination, and indexing. In a database context, hash keys are typically single column values or a tuple of values for a small number of columns. Payloads in a hash table may be full records, pointers, or record identifiers, to records, or they may represent values of an aggregate computed using hash aggregation.
FIG. 1 shows pseudo-code for a hash probe operation on a bucketized hash table with separate chaining. The code shown in FIG. 1 assumes that the hash table contains no duplicate keys and that zero is not a valid payload value.
Thus, conventional hash tables perform poorly, in terms of time required for hashing, on processors due to branch mispredictions, cache misses, and poor instruction-level parallelism. Conventional hash tables also have space overheads. Prior hashing schemes based on cuckoo hashing have been proposed to solve hashing scheme issues, most of which address only the space issue. Cuckoo hashing is a scheme in computer programming that allows a key to be placed in one of several locations in the hash table. In cuckoo hashing, it is possible to get over 99 percent space efficiency for some configurations. That is, 99 percent of the space used is occupied by actual items/records.
Schemes based on cuckoo hashing need to compute multiple hash functions, and look in multiple slots of the hash table. It was therefore assumed that such a scheme would be less time-efficient than conventional hashing that uses one hash function and usually looks in one slot. Prior implementations of an extended cuckoo-hashing scheme require 1900 cycles/probe on a Pentium® 4 processor.