SQL SET operations are used in queries to find differences (minus), commonalities (intersect), and combinations (union) between multisets of database records. With the minus SET operation, for example, the result set includes unique database records returned by a first SQL SELECT statement that are not returned in a second SQL SELECT statement. Thus, the SQL statement illustrated in Table 1 below outputs records that have a unique product_id in t1 that is not included in t2.
TABLE 1Example query with MINUS operatorSELECT product_id FROM t1MINUSSELECT product_id FROM t2By contrast, the intersect SET operation generates a result set including unique database records returned by both the first SQL SELECT statement and the second SQL SELECT statement. Thus, the SQL statement illustrated in Table 2 below outputs records that have a product_id that is included in both t1 and t2.
TABLE 2Example query with INTERSECT operatorSELECT product_id FROM t1INTERSECTSELECT product_id FROM t2
One approach for evaluating set operations involves using a “unique” operator in the first phase followed by a “combine” operator in the second phase. The “unique” operator sorts and compares records within a data set to identify and remove duplicates. The “combine” operator then performs the MINUS, INTERSECT, or UNION operation. In the queries depicted in Table 1 and Table 2, for example, the first phase would involve performing a “unique” operation on t1 and performing a “unique” operation on t2. The second phase for the queries depicted in Table 1 and Table 2 would involve performing a MINUS and INTERSECT operation, respectively, on the results from the first phase. This approach provides a simple and straightforward way to evaluate set operations. However, this approach may involve multiple resource-intensive steps, with each phase sorting and/or comparing records from potentially large data sets.
Another approach for evaluating set operations comprises performing a “join” of the multisets, followed with the “unique” operation. For the MINUS operation depicted in Table 1, an anti-join is performed to join t1 and t2 to generate a table that includes only those records from t1 for which there is no record in t2 that has an equal product_id value. For the INTERSECT operation depicted in Table 2, a semi-join is performed to generate a table that includes only those records from t1 for which there is a record in t2 that has a matching product_id value. The “unique” operation is then performed to remove duplicate records from the result set generated by the “join” operation. This approach may reduce resources consumed by the “unique” operator if the result set of the “join” operation is small. However, when the “join” operation produces a large data set, this approach may be more resource-intensive than performing the “unique” operation on each individual data set. For example, removing duplicates from a large table generated by the join operation may, in some cases, involve more comparisons than removing duplicates from each individual table before the join, especially when there are a large number of duplicates. In addition, the “join” operator and the “unique” operators are also relatively expensive in terms of CPU and memory usage, with each phase sorting and/or comparing records from potentially large data sets.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.