In Massively Parallel Processing (MPP) systems, Business Intelligence (BI) and Enterprise Data Warehouse (EDW) applications process massive amounts of data. The data (a set of relational tables) resides in very large database systems that rely on a large number of central processing units (CPU) to efficiently execute database operations. Operations are executed in parallel instead of being serially performing.
Balanced loading is a key to good performance in parallel processing architecture. MPP systems apply a divide-and-conquer approach of attempting to distribute evenly the data among the available processors to balance the overall processing load. This approach, however, does not account for skew which can significantly diminish effectiveness of parallel processing.
The accuracy of query plan costing used for optimizing the performance of parallel database operations is adversely affected by the presence of skew or by situations where a ratio of a number of distinct values to total number of parallel processors (D/P) is low. The effect of skew is to diminish the parallel processing. In fact, as parallelism is increased under these situations, the adversity of the effect actually increases and negatively affects scalability in increasing fashion.