1. Field of the Invention
The present invention generally relates to managing relational databases in a multi-processor environment and, more particularly, to joining two database relations on a common field in a parallel relational database environment in the presence of data skew by partitioning the join operation into separate jobs and optimally scheduling the jobs among a plurality of processors.
2. Description of the Prior Art
A common operation in relational database systems is the natural join of two relations on respective columns defined over a common domain. See, for example, the description of the natural join at pages 209 and 210 of An Introduction to Database Systems, Vol. 1, 3rd Ed., by C. Date, Addison-Wesley (1982). The result of the join is a new relation in which each row is the concatenation of two rows, one from each of the original relations, such that both rows have the same value in their respective join columns.
One popular algorithm for computing the join of two relations is the sort-merge technique as described by M. Blasgen and K. Eswaran in "Storage and Access in Relational Databases", IBM Systems Journal, vol. 4, pp. 363 et seq. (1977). It can be summarized briefly as follows: First, each of the relations is sorted (if necessary) according to the join column. Second, the two sorted relations are scanned in the obvious interlocked sequence and merged for rows which have equal values.
When sort-merge joins are performed in parallel on a multiple processor database system, there exists a problem of data skew that might exist in the join columns of the relations. In general, the issue of skew is not addressed by the join algorithms described in the literature. An early article on parallel sort-merge join is "Parallel Algorithms for the Execution of Relational Database Operations" by D. Bitton, H. Boral, D. J. DeWitt, and W. K. Wilkinson, ACM Trans. on Database Systems, vol. 8, no. 3, Sep. 1983, pp. 324-353. Bitton et al. propose two external parallel sort algorithms which they call parallel binary merge and block bitonic sort. In both algorithms, sorted runs are written to disk and two-way merges are used to merge runs from disk. The merge tree is mapped to different processors with the final merge being sequential.
In "Join and Semijoin Algorithms for a Multiprocessor Database Machine", ACM Trans. on Database Machines, vol. 9, no. 1, Mar. 1984, pp. 133-161, P. Valduriez and G. Gardarin describe the algorithm generalized to a k-way merge. The result is p lists which are merged (assuming p&lt;k) sequentially on a single processor.
In the paper by J. P. Richardson, H. Lu and K. Mikkilineni entitled "Design and Evaluation of Parallel Pipelined Join Algorithms", ACM SIGMOD 1987, San Francisco, May 1987, pp. 160-169, a method to parallelize the merge-join operation is described. In the method of Richardson et al., the relations T.sub.1 and T.sub.2 are merged to m.sub.1 and m.sub.2 runs. Assuming that T.sub.1 is the larger relation, each run of T.sub.1 is assigned to a processor. Each processor merges m.sub.2 runs of T.sub.2 (i.e., the final merge of T.sub.2 is repeated, at least as many times as there are processors) and merge-joined with runs of T.sub.1 assigned to that processor. This method is good when the projections of the two relations to be joined are such that one of them is very small. They also describe another version of their algorithm that is useful if one relation is small.
S. G. Akl and N. Santoro in "Optimal Parallel Merging and Sorting Without Memory Conflicts", IEEE Trans. on Comp., vol. C-36, no. 11, Nov. 1987, pp. 1367-1369, consider merging two sorted lists in parallel by partitioning each of the two lists.
Another popular algorithm for computing the join of two relations is the hash join technique described by D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Maralikrishna in "Multiprocessor Hash-based Join Algorithms", Proc. 11th VLDB (1985). For a multi-processor system, it can be summarized briefly as follows: First, both relations are hashed (if necessary) into hash partitions according to the join columns. The number of hash partitions generally is set equal to the number of processors. Then the hash partitions are distributed among the processors, so that the corresponding partitions of the two relations reside on the same processor. Second, the corresponding hash partitions of the two relations are joined together.
Although performance of a join query may be sped up by the use of multiple processors, the speed up from this kind of conventional join algorithm can be very limited in the presence of data skew as described by M. S. Lakshmi and P. S. Yu in "Effect of Skew on Join Performance in Parallel Architectures", Proc. Intl. Symposium on Databases in Parallel and Distributed Database Systems (1988). In "A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment", Proc. ACM Sigmod Conference (1989), D. A. Schneider and D. J. DeWitt evaluate the performance of four parallel join algorithms. They suggest that in the case of high data skew, an algorithm other than hash join should be considered. In "Removing Skew Effect in Join Operation on Parallel Processors", Technical Report CSD-890027, UCLA (1989), R. C. Hu and R. R. Muntz propose a simple hash join algorithm to identify the biggest skew element and assign multiple processors to handle it.