A massively parallel processing (MPP) database is a database where a large number of processors perform a set of computations in parallel. In an MPP system, a program is processed by multiple processors in a coordinated manner, with each processor working on a different part of the program and/or different data.
An MPP database system is based on shared-nothing architecture, with the tables of the databases partitioned into partitions and distributed to different processing nodes. When database queries arrive, the tasks of each query are divided and assigned to the processing nodes according to the data distribution and an optimized execution plan. The processing entities in each processing node manage only their portion of the data. However, the processing entities may communicate with one another to exchange necessary information during execution. A query may be divided into multiple sub-queries, and the sub-queries may be executed in parallel or in some optimal order in some or all processing nodes. The results of the sub-queries may be aggregated and further processed. Subsequently, more sub-queries may be executed based on the result.