Within a computer system, various algorithms may be used to search a database for items that satisfy a specified property. For example, a hash table uses algorithms to search for requested information. A hash table is used in a computer to enable fast lookup of data within a database. Furthermore, a hash table is made up of a finite number of head buckets. Each head bucket is capable of holding a limited amount of information.
Hash tables operate by associating a data's key (such as a person's name) with a corresponding value (such as that person's phone number) within the data structure. This operation works by using a hash function to transform the key into a hash value. A hash function is a reproducible method using an algorithm which turns data into a number. For example, this hash function algorithm substitutes and transposes the key to create what is hoped to be a unique number, termed a ‘hash value’. Hash values are used to locate head buckets into which a key's corresponding value was placed.
For example, suppose a data's key is ‘John Smith’ with a corresponding value of phone number ‘555-555-5555’. A hash function algorithm transforms the key, John Smith, into the hash value, 642. John Smith's corresponding value, 555-555-5555, is then placed in head bucket 642. In response to a user's request for the information corresponding to John Smith, a computer recognizes the data sought as being located in head bucket 642. Head bucket 642 is accessed and the phone number, 555-555-5555, is retrieved.
In another example, suppose a key is the birth date, ‘Aug. 27, 1970’, which is hashed to be the hash value ‘6’. The birth date's corresponding value is ‘Julie’. Julie is placed into head bucket 6. The next key to be hashed is the birth date ‘Jul. 19, 1970’ with a corresponding value of ‘Bruce’. An algorithmic function hashes the birth date to be ‘6’ and attempts to place the corresponding value Bruce into the head bucket 6. However, head bucket 6 is already full. Since two or more keys hashed to the same hash value cannot be stored in the same head bucket, a collision is created with the attempted addition of data to an already full head bucket. Collisions are resolved by means of “collision chains”, where the head buckets point to a linked list containing the colliding entries.
Using the hash value, it is possible to find the information with a limited number of operations such as the following: computing the hash value, indexing the head bucket, and optionally walking the collision chain. Finding an entry in a large set in a small number of operations is the primary reason for using a hash table. When there are no collisions, finding the data associated to a key can be done with a fixed number of operations, which neither depends on the number of elements in the hash table, nor on the number of head buckets.
In some other instances, a computer system may be required to perform a global walk of all head buckets. For example, to delete all the entries in a hash table, it is necessary to examine all of the head buckets one after the other. Such operations require a larger number of operations, at least proportional to the number of head buckets. The number of operations required for such operations remains high even when there is little or no data stored in the hash table. Therefore, a global walk on a nearly empty hash table is an inefficient operation.
Additionally, for reasons such as performance, it is desirable to have a large number of head buckets within a hash table, as it reduces the chances of collision. Consequently, when there are a large number of head buckets, it takes a greater amount of time to walk all the head buckets in the hash table. There exist several hashing function techniques and hash table construction techniques to deal with inefficiencies during global walking. One simple technique would be to keep a list of head buckets containing data, which can be called “active” head buckets. This technique requires an additional field to be added to the head buckets, to store the link between elements of the linked list.
One particular embodiment of hash tables is to represent page table in the Itanium processor architecture. A page table is a data structure storing information about the virtual memory subsystem of the Itanium processor. The Itanium processor architecture represents its page table using a data structure called a “virtually hashed page table” (VHPT). A VHPT is a hash table where the key is virtual address information (hence “virtually hashed”) and the data is information telling the processor how to map the corresponding virtual address.
Usage of some hashing function techniques and hash table construction techniques are not available to be implemented on certain systems. For example, the Itanium architecture precisely defines the data layout of elements in a VHPT. Since it is not possible to add a data element to the VHPT head buckets to store a link, VHPT head buckets cannot be placed in a linked list.
Furthermore, reducing the number of head buckets within a VHPT to cut down on the time it takes for the computer to globally walk all head buckets is not feasible. Under normal conditions, the head buckets within a VHPT fill to capacity. Reducing the number of head buckets would degrade a VHPT's performance, as this reduction would cause an, increase in collisions.
As can be seen, current hash tables suffer from performance shortcomings when globally walking a partially filled hash table is a frequent operation. Additionally, current hash table techniques to solve such performance shortcomings may not be used with certain systems, such as the VHPT of the Itanium processor. Thus a technology which addresses these shortcomings would be advantageous.