The concept of massively parallel processing (MPP) is the coordinated processing of a program by multiple processors, with each processor working on different parts of the program. The processors communicate with one another to complete a task with each of them using its own operating system and memory resources.
An MPP database system is based on shared-nothing architecture, with the tables of its databases partitioned into segments and distributed to different processing nodes. There is no data sharing among the processing nodes. When database queries arrive, the work of each query is divided and assigned to one of the processing nodes according to a data distribution plan and an optimized execution plan. The processing entities in each processing node manage only their portion of the data. However, these processing entities may communicate with one another to exchange necessary information during their work execution. A query may be divided into multiple sub-queries, and the sub-queries may be executed in parallel or in some optimal order in some or all the processing nodes. The results of the sub-queries may be aggregated and further processed, and subsequently more sub-queries may the executed according to the results.
One of the challenges in an MPP database system has always been in setting up the distributed system, which includes configuring the machines, creating the database, partitioning the tables, and distributing the segments. How data is distributed and how much the distribution is aligned with the business logic greatly determines the overall performance of the system.
A traditional MPP database system allows a database administrator to create a database and distribute its data (i.e., create database partitions) to a fixed number of processors that are setup ahead of time. Unfortunately, the number of partitions a database administrator can create is fixed and directly corresponds to the number of processors available in the traditional MPP database system. In other words, the number of partitions and the number of processors is always the same. If additional partitions are desired, the entire process of setting up instances and partitions must be repeated.