A massively parallel processing (MPP) database management system is designed for managing and processing very large amounts of data. In general, a MPP database system comprises at least one coordinator node and multiple data processing nodes. Coordinator nodes (or coordinators) are the front end of MPP database systems and coordinate with the data processing nodes (also called processing nodes). Clients connected to a MPP database submit queries to the coordinators, which dispatch queries to the processing nodes for execution. The coordinator nodes and processing nodes together form a MPP database cluster. In the MPP database, tables are divided into partitions and distributed to different processing nodes. The processing nodes manage and process their portion of the data, which may be performed in parallel on each of the processing nodes.
However, a processing node may not have all of the required information to execute a query. For example, for a hash join query process, a particular partition key may not be the same as a join key. Thus, the processing nodes may communicate with one another to exchange necessary information in order to complete the processing. In the case of a database join or aggregation operation, if the tables formed during the operation are too large and insufficient memory is available on the processing node performing the hash, the tables (or partitions) are spilled to disk from the memory of the processing node. After spilling the tables to disk, the processing nodes may proceed with the hash join or aggregation operation.
However, spilling is both a time-consuming and expensive operation. Moreover, the table partitions may not be evenly distributed on the processing nodes as a result of data skew, thereby resulting in an uneven or unbalanced load. In an unbalanced load, some processing nodes end up having insufficient space in memory to process data, while other processing nodes have an over-abundance of sufficient memory.