Database Hash Tables
Computer programs commonly require a method of storing data records in such a manner that they can be quickly retrieved. For example, an address book typically has data records that correspond respectively to different persons. Each data record includes fields corresponding, for example, to a person's name, address, telephone number, and other information. Each data record is typically stored as an integral entity in whatever memory is being utilized. In this example, the name field in each data record is designated as a key, so that when a name is presented to the storage system, the data record can be quickly retrieved based on the key.
Many data structures exist for increasing the speed of storing and retrieving information based on keys. Such data structures allow a computer program to locate desired data records without requiring a search through all available records, somewhat like looking up a word in a dictionary without having to perform a linear search from the beginning of the dictionary. This is particularly important in databases containing large number of records.
A hash table is an example of a data structure designed to increase the speed and efficiency of database searching. A hash table is a sequence of entries, each of which has a unique address within the table. Each entry has a pointer that references or points to one or more records. The data records referenced by a particular entry are said to be assigned to that entry. The particular entry to which a record is assigned is determined by the record's key. More specifically, an address generation function is used to convert the record's key to an address of a hash table entry, and the record is assigned to this entry.
In general, it is not known ahead of time what key values will arise in conjunction with data records. As a result, it is possible that a particular address generation function will produce an identical address for two distinct key values, and that two records with different keys will be assigned to the same hash table entry. This situation is known as a conflict, and it must be handled by a conflict resolution method.
Depending upon their general strategies for conflict resolution, hash tables are divided into two classes: closed and open. In a closed hash table, each data record is assigned to a particular hash table entry, and conflicts are handled by finding a different entry in the table for one of the conflicting records. In an open hash table, each record is stored in a data structure that is pointed to by a hash table entry, and this data structure is generally capable of containing multiple records; a common such data structure is a linked list.
A classic open hash table 10 with linked lists is illustrated in FIG. 1. In the example of FIG. 1, eight records (referenced by numeral 11) have been assigned to entries (referenced generally by numeral 12) of hash table 10. In the following discussion, entries will be referenced by their addresses: entry 0, entry 1, and so on.
The key values of the records are "A", "B", "F", "J", "L", "P", "V", and "X". Key value "A" generates address 0, so the record with this key is pointed to by a pointer from entry 0 of the hash table. Key values "P" and "X" both generate address 1, so their corresponding records are stored as a linked list, one element of which is pointed to by entry 1 of the hash table. None of the records have key values that generate address 2, so entry 2 of the hash table does not point to any records.
When the hash table is to be searched for a target record that has a particular key, first the key value is turned into an address by means of an address generation function. The hash table in FIG. 1 has 8 entries, so the address produced by the function must be in the range 0 to 7. As an example of finding a particular record using the hash table, assume that the address generation function produces an address of 5 for key value "J". Entry 5 points to a record. However, this record has a key value of "B" and is not the desired record. Accordingly, a pointer associated with the "B" record is examined to find the next record in the linked list. This next record, having a key value of "F", is not the desired record, so the associated pointer is used to find the next record. Finally, this record has a key value of "J", indicating that the correct record has been found.
As another example, consider searching for the data record associated with key value "D". Suppose that the address generation function produces an address of 1 for this key. Entry 1 in the hash table points to a record; however, the record's key value "P" does not match the search key. Accordingly, the linked-list pointer maintained with this record is examined and used to find the next data record that has been assigned to entry 1. The next record has a key value of "X", which again does not match the search key. In this case, the pointer associated with the record is null, indicating the end of the linked list. Since no match has been found, the search concludes with the result that the desired record does not exist in the database.