In the context of data tables, a query may specify criteria indicating values that may be in more than one of the tables. To produce the query result, a computer query process may perform a join operation to return rows from the tables, where query process selects the rows based on the values indicated by criteria of the query. A column to be matched in two tables for joining the tables may be referred to as a “join key column.” More than one column may be matched for joining two tables, in which case the columns to be matched are all join key columns and those columns may be referred to collectively as the “join key.
The query process may use a hash table in performing a join operation, in which case the join operation is referred to as a “hash join.” (To help avoid confusion between the tables joined in a join operation, on the one hand, and a hash table used to perform the join operation, on the other hand, the hash table will be referred to herein as a hash array.) For a hash join operation, the query process typically uses the smaller of two tables, which may be referred to as the “inner table,” to build a hash array based on one or more key columns in the table. In general terms, when building a hash array a hash function computes a hash index that determines where to store each entry in the array based on the value of each entry itself. Thus, in building the hash array for a hash join, the join aspect of the query process computes the hash index for storing each row of the first table based on the one or more key column entries for the row.
Once the hash array is built (done during what may be referred to as hash array “build stage”), the join aspect of the process then scans the larger table, which may be referred to as the “outer table.” The process then probes the hash array to find and fetch payload columns of rows that match key values of the join key column or columns (which may be referred to as hash array “probe” stage). To perform the probing, the join aspect of the process performs the same hash function computation to produce a hash index on each value scanned from the outer table and then determines whether there is an entry in hash array at the location indicated by the index. If there is an entry, this hit usually means the entry was stored there during the building of hash array. The entry is likely to match the value scanned from the outer table and used to generate the hash index for the probing, although the hit may be a false positive, since sometimes more than one value can produce the same hash index, (referred to as a hash “collision”). If the entry does match, the process fetches payload columns of the row associated with the matching entry in the hash array.
As can be understood from the above description, a join operation involves finding matching data and fetching data associated with the matching data. The hash array data structure is used for join operations because it tends to speed up the finding (i.e., looking up) and fetching of data. That is, by looking up elements in an array by index, the time to retrieve an entry from the hash array is independent of where the entry is stored therein.