Massively parallel processing (MPP) is the coordinated processing of a program by multiple processors, with each processor working on different parts of the program. The processors communicate with one another to complete a task, but otherwise rely on their own operating system and memory resources. MPP database systems are based on shared-nothing architectures, where the database is partitioned into segments and distributed to a plurality of processors (data nodes) for parallel processing. Because each data node stores only a portion of the MPP database concurrently, database operations (e.g., search, scan, etc.) are performed more quickly than would otherwise be possible in a sequential processing system.
Clients access information in MPP databases by interacting with an MPP coordinator, which is a process configured to receive and respond to queries. More specifically, for each issued query the MPP coordinator consults a global catalog to develop a single query plan (referred to herein as a ‘global execution plan’), which is then distributed to each of the MPP data nodes for local execution. Notably, the MPP's coordinator's global view of resources and data distribution may lack knowledge of local configuration information and/or statistics local to the MPP data nodes, and, instead, may make generalized assumptions about, inter alia, local data distribution and/or resource availability. For instance, the MPP coordinator may assume that data is evenly distributed amongst the various MPP data nodes and/or that the MPP data nodes' resources (e.g., processing or otherwise) are unconstrained. As a result, the MPP coordinator's global execution plan may be sub-optimal for one or more of the local MPP data nodes, which may lead to inefficient execution of the global execution plan. Accordingly, mechanisms for improving query optimization in MPP database systems are desired.