In order to scale to very large amounts of capacity (e.g., multiple petabytes), shared-nothing parallel data warehouses typically leverage large clusters of commodity servers with local, direct attached storage. The physical design for shared-nothing databases typically includes decisions regarding the placement of data across a cluster of database servers comprising a massively parallel processing system (MPP). In particular, for each table in the database typically a distribution policy must be specified. In general, the choice of distribution policy affects the performance of query workloads significantly as individual queries may have to redistribute data on-the-fly as part of the execution, for example to join tables whose data is not co-located. Excessively moving data between nodes can flood the network with data thereby reducing the effectiveness of the system.