The present invention relates generally to the field of database computing systems. More specifically, the present invention is related to a method and system for avoiding intermediate data skew processing in a massive parallel processing (MPP) environment using range partitioning in order to cluster data within processing units for further processing.
A database engine (or storage engine) is an underlying software component that a database management system (DBMS) uses to create, read, update, and delete (CRUD) data from a database computing system. A database engine can also be adapted to prepare an execution plan which can be optimized before execution of the query of the database engine. Such optimization of the database engine is performed based on several processed information. Further, in case of a massive parallel processing (MPP) environment of the database computing system, an important dimension of the MPP environment is distribution of data of the database computing system among all processing units of the database computing system. In a MPP environment, data can be distributed several times during a single query execution of the database computing system. For example, the process of data redistribution is required for JOIN operation of the database computing system. For instance, the JOIN operator is one of the set operations available in relational databases. The JOIN operation specifies how to relate tables in a query of the database computing system. Further, such redistribution of data between processing units of the database computing system during the query execution may lead to intermediate data skew of the redistributed data. Intermediate data skew involves cases where one of the processing units of the database computing system received significantly more data than other processing units of the database computing system and, as a result, the database computing system allocates additional time to complete processing operations of the database computing system.