Hash indexes are indexes used to look up information based on key values. Specifically, a hash index includes a hash table. The storage areas in a hash table are referred to as “buckets”. Within a hash table, the information associated with any given key value is stored in a hash record in the bucket that corresponds to the hash value produced by applying a hash function to the key value. For example, assume that the key “userid1” is associated with “name1”. Under these circumstances, “name1” is stored in a hash record in the bucket that corresponds to the hash value produced by applying a hash function to “userid1”.
Once data has been stored in the hash table in this manner, the data for any given key value may be quickly obtained by applying the hash function to the key value to produce a hash value, using the hash value to determine the bucket to which the hash value maps, and retrieving the data from a hash record in that bucket. For example, the hash function may be applied to “userid1” to identify the bucket that contains the information (“name1”) for “userid1”. Once the appropriate bucket is identified, the bucket contents may be read to quickly obtain the information (“name1”) associated with the key value (“userid1”). Since the primary purpose of the hash table is to obtain information based on key values as quickly as possible, the lookup operations performed using a hashing scheme should be as fast as possible. Thus, it is desirable to avoid using latches to prevent conflicts between entities that are performing lookup operations and those that are modifying the data itself, since obtaining a latch prior to performing a lookup would significantly increase the time required to perform lookups.
Unfortunately, the performance benefit of using a hash table to access data depends heavily on the size of the hash table relative to the amount of information that must be stored therein. For example, if a hash table is too small, then the hash table's buckets will not be able to hold all of the hash records of the key values that hash to the hash buckets. Various techniques are available for hash buckets to “overflow” into other storage areas, but the larger the quantity of overflow data, the less efficient using the hash table becomes. For example, to find the hash record for a key value that hashes to a particular bucket, it would be highly inefficient to scan a chain of a million buckets that serve as overflow for that given bucket. Further, scans of the overflow chain are prone to cache misses, and bucket chaining cannot guarantee constant time lookup operations. In addition, bucket chaining does not combine well with remote direct memory access (RDMA) to the hash table, since following/traversing the bucket chain is equivalent to doing a number of dependent pointer dereferences, and thus increases the total number of RDMA read operations that a remote client would have to perform.
On the other hand, it is also inefficient for a hash table to be too large relative to the data to be stored therein. For example, it is wasteful of computing resources to create a hash table so large that the vast majority of the space allocated to its buckets goes unused.
One difficulty with sizing a hash table correctly is that the amount of information the hash table must store may vary over time. For example, assume that a hash table is used to store a record for each row of a database table. Initially, the database table may only be populated with 100 records. Thus, a relatively small hash table would suffice. However, as data is added to the database table, the number of hash records that need to be added to the hash table also increases. By the time the table reaches a million rows, it is highly likely that the initial size selected for the hash table would be insufficient.
When it is determined that the size of a hash table should be changed (either increased or decreased), the existing hash table may be discarded, and a new hash table may be created. For example, assume that each hash bucket is a fixed size. Under these circumstances, to increase the size of the hash table involves switching from a first hash function that hashes to a fewer number of buckets (e.g. 256) to a second hash function that hashes to a larger number of buckets (e.g. 1,024). Since there is no guarantee that, for any given key, the second hash function will produce the same hash value as the first hash function, all of the existing buckets must be discarded and the hash table must be rebuilt from scratch. During the rebuilding of the hash table, entities that require information from the hash table must either wait for the entire hash table to be rebuilt, or obtain the information some other way. Stated another way, resizing a hash index is typically a “stop the world” blocking operation where concurrent reads and write are blocked while the hash table is resized. Blocking concurrent reads is particularly undesirable because the primary purpose of the hash index is to provide fast reads. Consequently, it is preferable for a hashing scheme to allow readers to always make progress on obtaining the data they desire, even while the hash index is being resized.
Various approaches have been developed to address some of the problems relating to hash indexes. For example, Cuckoo Hashing (described in detail at en.wikipedia.org/wiki/Cuckoo_hashing) guarantees an O(1) lookup operation cost, but does not support resizability. Specifically, the hash table is fixed in size at compile time, and any rehashing is a complete blocking operation. On the other hand, linear hashing (described in detail at en.wikipedia.org/wiki/Linear_hashing) is a well-known dynamic hashing scheme that allows the hash table to be smoothly resized. However, linear hashing requires the presence of overflow bucket chains, which can lead to lookups operations that exceed O(1). In addition, when the hash table is being accessed through Remote Direct Memory Access (RDMA), overflow bucket chains can significantly increase the number of RDMA operations required for any given lookup operation.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.