Query processing has been optimized for disk-based systems, because these systems can hold very large tables on which the processing operates. A common operation in query processing includes generating joins of these large tables, but the joins may incur many trips to the disk system, reducing performance. Locating the tables in memory, with multiple servers providing the large amounts of memory needed, improves performance. However, the higher performance comes at the price of high power consumption of the servers.
An alternative to multiple servers is a cluster of low power nodes, each of which has a low power processor, a modest amount of memory, and no persistent storage that would virtualize the memory. The cluster, however, can have a very large number, in fact, thousands of nodes. The aggregate memory and processing power of the large number of nodes provides the benefits of multiple servers but at low power.
Given the cluster's high processing power and low power consumption, it is desirable to optimize query processing for a cluster so that it can handle even larger tables with high performance. Because a cluster lacks persistent storage and access to transaction logs, the cluster cannot take on all of the database management tasks demanded by query processing. The cluster has to interface with a traditional relational database management server (RDBMS) to obtain tables or portions of tables on which the cluster operates and it has to rely on the RDBMS to maintain transactional consistency. The heterogeneous system, the traditional RDBMS server and the cluster, provides the possibility of higher performance and low power for query processing. To obtain the most performance from such as system, a mechanism is needed to optimally allocate the query processing, such as join operations, between the cluster and the RDBMS.