1. Field of the Invention
The present invention relates to computer data structures and algorithms. More particularly, the present invention relates to hash tables and hashing functions and, more specifically, to dynamically-sized hash tables and lazy rehashing algorithms.
2. Related Art
Hash tables employ hashing functions, or algorithms, to store and search for records based on a key within or assigned to each record. Records can include, for example, employee records such as payroll records, data structures such as page frame data structures, words from a dictionary, et cetera. Page frame data structures are data structures that store identification and state information for pages of memory, or page frames. Keys can be numbers or words. For example, where records are employee payroll records, a key can be an employee identification number. Where records are page frame data structures, a key can be a logical offset of a page of a memory object that is stored in a page of memory that is represented by the page frame data structure.
Hashing functions include open hashing functions and closed hashing functions. An open hashing function typically distributes records within a hash table array according to a hashing algorithm and the number of hash buckets in the hash table. Each hash bucket includes a pointer to a linked list of stored records. Each record is hashed to a hash bucket based on the key value of the record. A bucket can associate any number of records in its linked list. A record having a particular key value will always be associated with a particular bucket. A hash table is allocated with a desired number of buckets. Records can be added to the hash table as necessary.
When searching for a record in a hash table, a key value for the record is provided to the hashing function. The hashing function identifies a bucket to search in. The bucket provides a starting point for the search. If the record exists in the hash table, it will be associated with the identified hash bucket. The search for the record begins with the first record in the linked list associated with a hash bucket and continues with successive records in the linked list until the record is found or until the search has exhausted all of the records in the linked list. If the search identifies the record, the record can be accessed. If the record is not found among the linked list of records associated with the bucket, the record is not in the hash table. Since records are distributed among a number of hash buckets, and since a search for a particular record is immediately narrowed to a search of only one of the hash buckets, hashing functions limit the amount of searching necessary to find a record. Hash tables are thus compact data structures that can be used to store and retrieve data in an organized fashion.
So long as the number of records within the hash buckets remains manageable, the linked list of records in the bucket can be searched in a relatively short period of time. However, as the number of records in the hash bucket increases, the amount of time required to search a linked list of the hash bucket also increases. Thus, in conventional hashing, it is important to know how many records need to be hashed when setting the number of buckets at initialization. Since the number of hash buckets is fixed at initialization, hash tables are not scalable or dynamic. Although the number of hash buckets could be overestimated to insure that the number of records remains manageable, overestimations result in unused hash buckets which consume valuable memory.
Hashing functions and hash tables are discussed in, for example, Aho, Hopcroft and Ullman, Data Structures andAlgorithms, Addison-Wesley, at for example, pp. 112-135, 128, 162, 168 and 363-365, (1983), incorporated herein in its entirety by reference; and, Horowitz and Sahni, Data Structures in PASCAL, Computer Science Press, at for example, 425-457, (4th Ed., 1994), incorporated herein in its entirety by reference.
When the number of records to be stored and searched are not known beforehand, tree data structures are generally employed. Tree structures can be easily grown or added to and are thus scalable and have good search times. However, tree structures require numerous pointers, such as, parent pointers, sibling pointers, and child pointers. Thus, although tree structures are flexible and scalable, they take up much more memory space than hash tables. As a result, where very compact data structures are required, hash tables are preferred.
What is needed is a system, method and computer program product for dynamically sizing hash tables in order to keep them as compact as possible and to avoid long bucket links.