The present invention relates to creating hash tables, and more specifically, to creating lock-free hash tables in parallel.
The performance of many database queries, and in particular more complex ones that require combining results from multiple tables, typically depends on the efficiency of the relational join operator. For queries referencing more than a few rows, an efficient join method is a hash join. Each value of the join column is hashed by a hash function to a value that indexes a bucket (entry) in the hash table. A typical hash join first builds hash tables for one or more smaller tables, usually smaller tables whose contents more readily fit into memory, against which it then probes the rows of the larger table using a different equality join predicate for each table probed. If the join predicate is true, the qualifying rows are added to a result set. The hash function may introduce collisions, in which two distinct values of the same join column may hash to the same bucket of the hash table.
Commodity, multi-core systems typically parallelize the creation and probing of hash tables in an effort to improve join performance. Probing an in-memory hash table in parallel does not require any locking or latching, as it requires read-only access. However, building a hash table in parallel typically results in concurrent write access to the hash table, which requires a lock for synchronization. Locking a memory region for exclusive access is a time-consuming operation, particularly in the presence of concurrent access to the same region by other insert operations that have to wait for write access.