Storing records in a data table and retrieving the records are common tasks. Various data structures, table organizations, and access techniques have been utilized to determine a location for storing an element of data and to determine the location in which an element of data has been stored. In general, the data may be stored in a table of records or elements, where each element has a collection of fields associated with it. In the table, each field is associated with one of a number of attributes that, together, make up the element. One of the attributes is the key that refers to the element and on which the searching is based. Various techniques for organizing a table include lists, binary search trees, digital search trees and hash tables.
Hashing is a method that stores data in a data table such that storing, searching, retrieving, inserting, and deleting data can be done much faster than by traditional linear search methods. Hashing transforms keys using a hash function, and maps the hashed keys to data table locations. The situation where every key, when hashed, produces a unique index, and thus corresponds to a unique location in the hash table, is known as perfect hashing. Such a situation is difficult to achieve. Typically, two or more keys hash to the same physical location, or home location. This is known as a collision. For example, a simple hash function may be: key mod 11. For such a hash function, the keys 27 and 60 both result in a hash of 5 and thus map to the same home location. Consequently, the keys result in a collision.
When a collision occurs among a group of keys, the keys may be stored in a chain joined together by links. The link for one key indicates the location of the next key in the chain. In a typical hash table, therefore, each record includes at least a key field and a link field. The key field stores the search key, while the link field stores a location of a next key in the chain. In order to determine the locations at which the keys in the chain are stored, a collision resolution scheme is used. Different collision resolution schemes may result in a different number of links being traversed in order to find a particular key. Accessing the link includes a call to a secondary storage device, or a probe. A measure of the efficiency of a collision resolution scheme is the average number of probes required for stored keys.
For example, FIG. 1 depicts a conventional hash table 20. The conventional hash table 20 includes key fields 22, link fields 24, and other optional fields 26. The conventional hash table 20 uses a conventional collision resolution scheme known as LISCH (Late Insertion Standard Coalesced Hashing). In LISCH, a key that is hashed to a key field occupied by another key is stored in a first empty key field, or slot, from the bottom of the hash table 20. Each field 22, 24, and 26 includes locations 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. Thus, the key fields are 22-0, 22-1, 22-2, 22-3, 22-4, 22-5, 22-6, 22-7, 22-8, 22-9, and 22-10 with corresponding link fields 24-0, 24-1, 24-2, 24-3, 24-4, 24-5, 24-6, 24-7, 24-8, 24-9, and 24-10. Suppose the keys 27, 18, 29, 28, 39, 13, 16, 42, and 17 are desired to be stored in the hash table 20 using the hash function key mod 11. The first two keys, 27 and 18, hash to different home locations. Thus, the key 27 is stored in key field 22-5 of the key field 22 while the key 18 is stored in key field 22-7 of the key field 22. Note that other data corresponding to these keys may be stored in the remaining field(s) 26. However, the key 29 hashes to the same home key field 22 (key field 22-7) as the key 18. Using LISCH, the key 29 is stored in the first open key field from the bottom of the hash table 20. Thus, the key 29 is stored in key field 22-10 and the value for (or address of) the location 10 is stored in the link field 24-7. The next key 28 hashes to key field 22-6, which is empty. The key 28 is stored in the key field of its home key field 22-6. The key 39 also hashes to key field 22-6. There is a collision between the keys 28 and 39. Using LISCH, the key 39 is stored in the key field 22-9 and the address for key field 22-9 is stored in the link field 24-6. The key 13 hashes to empty key field 22-2. The key 13 is thus stored in its home key field 22-2. However, the key 16 hashes to the same key field 22-5 as the key 27. Using LISCH, the key 16 is stored the next empty key field from the bottom of the hash table 20, or key field 22-8, and the address for key field 22-8 is stored in the link field 24-5. The next key 42 would has to a home key field 22-9. However, the key 39 is already stored in this key field 22. Using LISCH, the key 42 is stored in the next empty key field from the bottom, or key field 22-4, and the address for key field 22-4 is stored in the link field 24-9. The last key, 17, would map to key field 22-6. Thus, the last key 17 is part of the chain starting with the key 28. Using LISCH, the key 17 is stored in the next empty key field 22-3, and its location is added to the link field of the last key in the chain, or link field 24-4.
Note that the table 20 has a chain that includes 28, 39, 42, and 17. However, only keys 28, 39 and 17 would hash to the same home key field 22-6. The key 42 is included in the chain because LISCH places part of the chain hashing to home key field 22-6 in the home key field (key field 22-9) of the key 42. Using LISCH, therefore, the chain including 42 and the chain including the keys 17, 28, and 39 coalesce, or combine. Because the chains coalesce, the average number of probes increases. In the example shown, the average number of probes required to reach a key of the example keys 27, 18, 29, 28, 39, 13, 16, 42, and 17 is 1.8 probes.
FIG. 2 depicts another conventional hash table 20′. The conventional hash table 20′ includes a key field 22′, link field 24′, and other optional fields 26′. Each field includes locations 0′, 1′, 2′, 3′, 4′, 5′, 6′, 7′, 8′, 9′, and 10′. Thus, hash table 20′ includes key fields 22-0′, 22-1′, 22-2′, 22-3′, 22-4′, 22-5′, 22-6′, 22-7′, 22-8′, 22-9′, and 22-10′ with corresponding link fields 24-0′, 24-1′, 24-2′, 24-3′, 24-4′, 24-5′, 24-6′, 24-7′, 24-8′, 24-9′, and 24-10′. The conventional hash table 20′ uses a conventional collision resolution scheme that precludes coalescing of chains. The particular scheme used is known as DCWC (Direct Chaining Without Coalescing). Other collision resolution schemes that preclude coalescing of chains are known. In DCWC, when a key's home location is occupied by another key and the other key is not at its home location, the other key is moved to a new location. For example, suppose the keys 27, 18, 29, 28, 39, 13, 16, 42, and 17 are desired to be stored in the hash table 20′ again using the hash key mod 11. In such a case, the keys would be stored in the same manner as for FIG. 1 until key 42 is hashed. However, the key 42 would hash to the home key field 22-9′, in which the key 39 resides. Because the key field 22-9′ is not the home key field of the key 39, DCWC is employed. In particular, the chain including keys 28 and 39 is moved (taken out and reinserted), allowing the key 42 to be stored in it home key field 22-9′. The key 28 is stored in the home key field 22-6′ and the key 39 is reinserted in the next open key field from the bottom of the table 20′, the key field 22-4′. The last key in the list, key 17, is in a chain that includes 28 and 39 and is stored accordingly. Because the chains do not coalesce, the average number of probes decreases. In the example shown, the average number of probes required to reach a key of the example keys 27, 18, 29, 28, 39, 13, 16, 42, and 17 is 1.6 probes.
Although hash tables and collision resolution schemes, particularly those that preclude coalescing of chains, function one of ordinary skill in the art will readily recognize that further improvements in hashing are desired. In particular, a method and system that allow for more efficient searching of keys.
Accordingly, what is needed is a more efficient method and system for storing chains subject to collisions. The present invention addresses such a need.