Query processing has been optimized for disk-based systems, because these systems can hold very large tables on which the processing operates. A common operation in query processing is a join operation on very large tables. Such a join operation may incur many I/O operations to the disk system, reducing performance. An alternative to disk-based systems is a cluster of computing nodes, each of which has a processor, a modest amount of memory, and non-persistent storage for storing table data accessed by query processing, and each of which are connected together through a network. A cluster of computing nodes, however, can have a very large number, in fact, thousands of nodes. The total memory and processing power of the large number of nodes of a cluster provides advantage over disk based systems, particularly when nodes perform operations for query processing in parallel. Such cluster of computing nodes may be used for a database management system and is referred herein as “cDBMS.”
However, since the computing nodes of a cluster have relatively small memory compared to the disk storage of disk based systems, each node may not be able to store all the database objects required for a join operation in a query. Accordingly, the database objects or portions thereof have to be distributed or replicated between nodes in the cluster, perhaps creating an uneven distribution of database object data across the cluster. Such an uneven distribution overloads one or more nodes in the cluster reducing the overall performance of the join operation and diminishing the advantage of parallel processing of cluster over disk based systems.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.