Structured Query Language (SQL) is a standardized language for accessing and updating relational databases. The American National Standards Institute (ANSI) and the International Standards Organization (ISO) published the first standard in 1986. The current standard is SQL-99.
SQL provides a mechanism for creating tables, each of which contains rows and columns of information that are collectively assembled into a database. Ideally, the tables are “normalized” in that the structure of the tables avoids data redundancy and allows the resulting data model to be mapped to many different physical database designs. In order to avoid redundancy, yet still be able to display data from multiple tables, SQL provides a mechanism called a join. There are many types of joins that provide various results. Some types of joining, however, have some important performance and cost implications.
Once multiple tables are involved, performance can be affected significantly by the execution plan (i.e., in the specific way the tables are joined). Join performance is much more acute when the tables are large, such as in data warehouse applications. The quest for performance has encouraged optimization of joining techniques and the development of the sort merge join, the nested loops join, and the hash join.
Hashing translates an index, or a join-column value, to an offset and then to a database address. If a row needs to be selected on the basis of a supplied index value, it can be done by converting the index through a hash algorithm, to an offset that can then be added to the “row identifier” of the first row to provide the address of the block where the information is stored. Thus, a row can be identified through the key value without applying an index and without having to perform a full table scan. This mechanism can be used in a hash cluster, which contains rows with the same hash value. In certain circumstances, hash clusters can provide considerable performance advantages over indexing.
In situations where the SQL query contains an inclusion or exclusion condition (i.e., an “IN” or “NOT IN” condition, respectively), a merge-join is normally performed. Typically, the merge-join requires two full table scans and a sort operation. Such sort operations typically require significant resources and add significantly to the cost of the operation. Employing a hash join would reduce the resources necessary to perform such an operation. There is, therefor, a need in the art for a hash join method that allows for inclusion/exclusion conditions in the SQL statement.