A relational database management system (RDBMS) often uses query parallelism to reduce query processing time. One common approach for query parallelism is to allow several threads to carry out similar execution paths in parallel on different (possibly overlapping) subsets of data (work items) for the query. The number of work items can be the same as, or more than, the number of execution threads. In the former case, each thread is assigned one work item. In the latter case, usually there are many fine grain partitioned work items, and each thread takes one or more remaining work items for processing in a rotating fashion. In some cases, data associated with one or more work items needs to be aggregated during query execution, such as after sort or materialization, and re-partitioned before being processed further. Fine grain partitioning is one known solution to handle skewed data. However, this approach resolves the problem by producing a larger number of tasks than can be processed at any one time. Furthermore, this approach introduces overhead in context switching between these multiple tasks, and it does not guarantee that the partitioning strategy is optimal for downstream tables.
This intra-query partitioning decision is usually made at query optimization time by analyzing statistics of data or some subset of data. The actual and accurate distributions and correlations of data among tables are usually not known until a query is processed. In addition, some tables joined in the later stages of a long sequence of a join-pipeline can introduce a significant size skew of different work items, which is not anticipated at the query optimization time. These issues may cause the partition decision made at the optimization time to be less optimal at the execution time.