Massively parallel processing (MPP) systems have been widely adopted in recent years. MPP systems are distributed systems including multiple, network connected, independent nodes (e.g., compute nodes). Each node is self-sufficient in that it includes its own processor, memory, and operating system, among other things. Employment of a plurality of such nodes enables high-scale parallel processing. MPP system are also referred to as “loosely-coupled” or “shared nothing” systems based on node independence, use of network communication, as well as unshared processors, memory, and storage.
Although not limited thereto, MPP systems are typically used as data warehouses. That is, MPP systems are employed to manage and query vast amounts of data. For contrast, consider a single machine, or node, data warehouse. In this scenario, scaling problems can exist in view of massive quantities of data that can be available in certain circumstances. A parallel data warehouse, or in other words a data warehouse embodied as an MPP system, solves this problem by enabling scale out across many machines, or nodes, while still providing an illusion of a single database to a user. The illusion is called a single system image. This image allows a user to pretend that a giant database exists that includes all data when in fact data is distributed over numerous databases.