A primary goal of data distribution design in a massively parallel processing (MPP) database is even distribution of data among separate nodes of the system. In particular, multiple processors, each having independent operating systems and memory resources, are coordinated in order to process a program in parallel. Such a system may be referred to as “shared-nothing,” where tables of database(s) are partitioned into segments and distributed amongst different processing nodes, with no data sharing occurring between processing nodes. Data is partitioned across processing nodes such that each processing node has a subset of rows from tables in the database. Each processing node processes only the rows on its own disks. The MPP database includes a coordinator system, which hosts the database but does not itself store database records, but rather metadata about the database. The coordinator system is able to host a database engine, the controlling entity of a MPP system, and can generate query plans, coordinate query execution by the processing nodes, and aggregate data system-wide. A common messaging interface interconnect is used by all processing nodes and the coordinator system for communication amongst the processing nodes and the coordinator.
When a database query arrives at the coordinator (e.g., from a client system), the query is divided and assigned to processing nodes according to a data distribution plan and an optimized execution plan. The processing entities in each processing node manage only their portion of the data. However, these processing entities may communicate with one another to exchange any needed information during execution. A query may be divided into multiple sub-queries, and the sub-queries may be executed in parallel or in some optimal order in some or all of the processing nodes. The results of the sub-queries may be aggregated and further processed, and subsequently additional sub-queries may the executed according to the results.
In a conventional MPP database system entries of a table are generally distributed using one of the following methods: hash; random (e.g., round-robin); range; or, list. Currently, many MPP databases distribute data using hash distribution. In hash distribution the key values of records are hashed into buckets, and the buckets are assigned to processing nodes. With hash distribution, data can be directly located through the key with minimum consumption of system resources. This provides a high performance characteristic for hash distribution, since lookup of a value by the key is straightforward, merely requiring calculation of the hash value to determine which node hosts that particular hash bucket. The query is sent to only that processing node, without involving other processing nodes. Another common method of distribution for a MPP database is a round-robin distribution. With round-robin distribution data is distributed around all processing nodes of the MPP database, which is useful for distributing records evenly into the nodes in the system. However, a disadvantage of round-robin distribution is in the area of looking up data records. In general, during a lookup of a record many (if not all) of the nodes in the system must be visited in order to locate the particular record. Thus, system resource utilization is less efficient for a MPP database using a conventional round-robin distribution scheme, as compared to a lookup on an MPP database using a hash distribution scheme.
One potential problem for MPP databases utilizing distribution by hash (as well as range or list) is the occurrence of skew. In cases where the distribution of the key value is not skewed, typically some buckets will be large while others are small. In most cases it is possible to place several large and small buckets together and to assign them to one processing node, the general goal being that every node has roughly equivalent size of overall data. However, in some cases the data distribution may be skewed, and records with a single key value may not fit in the storage of the designated processing node, while other processing nodes may have an abundance of remaining space. Simply adjusting the function by which the hash distribution is made will not address this issue and additional measures are needed to address the skew, particularly in the storage of the processing node. Data skew can also lead to an increase in the processing time required for executing a query, especially for that of a table join.