In relational databases, records are stored as rows (tuples) within tables (relations), with each data field in a record represented as a column in the table. A join is a query operation that selects data from one or more tables using a single "select" operation that specifies values for certain column or columns. The specified values are often referred to as the "join attributes." The rows of data which result from the join operation are created by forming the Cartesian product of the specified tables and eliminating rows that do not meet the specified join selection criteria.
A join operation to retrieve rows from multiple tables is the most frequently used operation in a relational database system. It is also the most expensive in terms of computer processing and file input/output (I/O) time, especially when large tables are involved and the join operation is of an ad-hoc type, i.e., at least one of the tables is not keyed on the fields that define the join attributes. Ad-hoc join operations require fill table scans of the tables in order to find rows with join attributes that satisfy the selection criteria, a process which requires a large number of central processor cycles and file I/O's. Additionally, depending on the exact nature of the join operations, full table scans may have to be performed multiple times for a single query.
Join operations fall into two basic categories: an "equijoin" operation in which the selection criteria contains an equality condition, such as requiring a match on one or more columns common to two tables, as opposed to other join operations specify either less-than or greater-than conditions. In either case, the algorithm that performs the join operation must compare each row in one table with each row in every other table designated in the join operation. In a computer system with insufficient memory to hold all the tables in memory, a join operation becomes quite costly in terms of file I/O time. Equijoin operations are of particular focus for reducing processing costs as they are the most prevalent type of ad-hoc query issued against a relational database in a transactional environment.
Various algorithms have been developed to minimize the processing costs associated with join operations. Three major types currently in use are nested-loops, sort-merge, and hash-based. All of the existing algorithms in these categories have flaws, however.
In a nested loop, one of the tables is fully scanned to find a match for a row in a second table. Because the first table must be scanned once for each row in the second table, the nested loop incurs high file I/O unless some form of indexing is first performed on the tables. Even then, the indices must all fit into memory to realize significant I/O savings; otherwise the reduction in file I/O is minimal.
Sort-merge algorithms sort the tables on the join attribute(s) and then perform a merge function on the matching rows. A sort-merge algorithm is very efficient when the column defined as the selection criteria is used as a key for each of the tables, but otherwise requires multiple scans of the tables to find all the matching rows. Sorting operations are costly in terms of central processing cycles and in the number of file I/O's that must be performed.
Hash-based algorithms are driven by the fact that in the majority of join operations, only a few of the rows in the tables satisfy the selection criteria so that most of the comparisons of the nested-loop and sort-merge algorithms produce no results. This is particularly true in the case of equijoins where only exact matches are specified. In a basic hash join function, a mathematical function called a hash function or hashing algorithm partitions each relational table into multiple groups of rows based on the values stored in the column(s) specified as join attributes. Each group of rows is referred to as a "bucket." Because the same hashing algorithm is used on each table, rows containing identical join attribute values will be allocated into corresponding buckets. Thus, only the entries in a few buckets must be compared to find rows satisfying the selection criteria. Moreover, as one of the inputs into many hashing algorithms is the number of buckets required to hold all the rows in the largest table, the number of buckets is frequently chosen based on a bucket that is sized to fit within the available memory. Therefore, I/O costs can be further reduced. However, given the random distribution of column values in most cases, a hashing algorithm frequently results in few or no rows allocated to some buckets while too many rows to fit within a bucket are allocated to other buckets. Bucket overflow defeats the purpose of sizing a bucket to fit within memory and reduces the efficiency of a hash-based equijoin operation.
The GRACE algorithm implements the basic hash join function in a first phase but adds a second phase which uses the same hash-based algorithm to group the rows for each bucket associated with one of the relational tables into a "hash table." (Optionally, a different hashing algorithm is used to build the hash tables to alleviate problems with hash duplicates.) The GRACE algorithm then searches the hash tables using entries from a second relational table. The hash tables thus increase the efficiency of scanning the buckets for matches. The GRACE algorithm relies on a very large number of buckets to reduce the chance of bucket overflow but then combines buckets with few entries to create more optimally sized buckets ("bucket tuning"). However, bucket tuning destroys the relationship among the rows in the buckets and increases the amount of searching necessary to find matches.
Another variant on the basic hash join function, the hybrid hash join, is designed to deal with the problem of bucket overflow. The hybrid hash join algorithm retains the first bucket of each relational table in memory during a partitioning phase. All other buckets are written to disk. Because the same hash algorithm used to partition the buckets is also used to build the hash table during a joining phase, the hash table for each of the first buckets is already constructed at the end of the partitioning phase. Thus, fewer file I/O's are required and computation time to construct the hash tables is saved using the hybrid hash join function. However, to completely process the majority of join operations, some of the buckets that have been written to disk must be read into memory and have their hash tables built during the joining phase. Therefore, multiple passes of the second table are necessary, increasing processing time and file I/O's.
Still a third variation on the hash join function is the simple hash join which constructs a single bucket during each pass through one table. Rows which do not hash into the current bucket are either written back to disk or ignored. The current bucket is then probed by hit rows from a second table to find matches. If the non-hit rows in the first table are ignored, the number of file writes is significantly reduced. However, the number of file reads increases since several passes must be made through the first table before all the rows have been allocated to a bucket and probed.
Therefore, there is a need for a hash-based process, directed particularly to equijoin operations, that reduces minimizes processor and I/O costs beyond the reductions achievable by current hash-based functions.