In a shared-nothing partitioned database architecture, table data can be spread across multiple database partitions existing on multiple server machines for flexible workload parallelization. A distribution key is used to distribute table and index data across a set of database partitions. The distribution key for a table in a table space on a partitioned database partition group is specified in the CREATE TABLE statement or the ALTER TABLE statement, and comprises one or more columns. To determine placement of rows in a database partition, a hashing algorithm is applied to all of the columns of the distribution key. This results in the generation of a distribution map index value. The database partition number at that index value in the distribution map identifies the database partition in which the row is to be stored.
Provided that the number of rows for each distribution key value is fairly consistent, the hashing algorithm will result in an even distribution of data rows across the database partitions. However in practice, a distribution key may skew toward a particular distribution key value. In a customer sales tracking database this may occur, for example, when a specific customer, represented by a single customer number, generates an unexpectedly large number of sales. As a result, the index map entry generated by the hashing algorithm for that customer number may resolve to a single database partition for inserting the sales rows, and this effectively prevents query parallelization and results in poor performance. Since it is not possible to alter the distribution key for a table without dropping and recreating the table, performance may be improved if rows associated with a skewed distribution key could be dynamically re-assigned to other database partitions to balance the distribution of data rows.