In the world of commercial computation, a major part of all computation is devoted to join evaluation. The cost in evaluating joins is high as well with respect to memory consumption as to processing time. A common technique for reducing the amount of data is the use of semi-joins. A join (e.g., an SQL join) combines two or more tables in a database, producing a new one that can be saved as a table or used as an intermediate result of more complex computations. The join combines the fields from the two tables by using values that are common to each of them. A semi-join is a binary operator on two relations. If these relations are R and S, the result of the semi-join of R with S is the set of all rows in R for which there is a row in S that is equal on their common attribute value. A relation is a data structure that consists of a heading (an unordered set of attributes as columns in a table) and a body (an unordered set of rows that share the same type). In computer science, a row represents an ordered list of attribute values. An n-tuple is a sequence (or an ordered list) of “n” elements, where “n” is a positive integer.
A semi-join between two tables consists of rows from the first table where one or more matches are found in the second table. If there are two relations R and S, the difference between the semi-join of R with S and the join between R and S is: the semi-join is a subset of if alone, whereas the join is a subset of the product R×S. As a subset, the semi-join contains every row of R at most once. Even if S contains two matches for a row in R, only one copy of the row in R is retained. Conceptually, if J is the join between R and S, the semi-join is the projection of J to R.
A join query is typically processed in the following way: first, semi-join reductions of the sizes of the joining relations are performed; then, the reduced relations are assembled to compute the join, and finally from every tuple in the join the attributes referenced in the expressions in the SELECT clause are projected, the expressions are evaluated and the results are returned to the user.