This invention relates to parallel-processor database systems and more particularly to a method for localizing execution and determining collocation of execution of subqueries in a parallel database.
A typical parallel processor computer system has a number of resources such as processors, memory buffers and the like. These resources can operate simultaneously, thereby greatly improving the performance of the computer when executing a task which has a number of sub-tasks that can be executed independently of each other.
Executing a sub-task usually involves executing a number of sub-tasks each of which in turn may have several parts. In a computer having only one processor, each step in executing each part of the sub-task is performed sequentially. In a parallel processor computer, several such operations can be performed simultaneously, but typically the parallel computer system does not have enough resources to go around. Resolving conflicting demands by the various sub-tasks for access to such resources has been a problem in the design of parallel processor computer systems, especially in the context of using such computer systems to evaluate complicated queries of a database.
Various kinds of parallel-processor database computer architectures have been proposed. Most of the proposed architectures for parallel-processor computers use a xe2x80x9cshared-nothingxe2x80x9d approach. A shared-nothing architecture comprises a collection of independent processors each having its own memory and disk and connected to the other processors via a high-speed communication network. In a shared-nothing database architecture, communication and synchronization overhead are critical factors in overall query performance. Shared-nothing systems are particularly well-suited to evaluate queries that can be partitioned into independent sub-problems, each of which is executed in parallel with the others.
There is a continuing need for a way to optimize query execution in a shared-nothing computer so as to make the most effective use of the various resources of the computer.
In shared-nothing database systems, the concept of xe2x80x9ccompatible partitioningxe2x80x9d to localize database operations is a known technique to minimize inter-processor communication. For example, by partitioning tables t1 and t2 on t1.a and t2.a respectively, all communication can be avoided in computation of the JOIN xe2x80x9ct1.a=t2.axe2x80x9d. This result follows since a partition of t1 will only join with a partition of t2 on the same node.
There still remains a need for an efficient way to optimize subqueries in a multi-processor or parallel computer system, and particularly in a xe2x80x9cshared-nothingxe2x80x9d computer system.
The present invention provides a method for localizing execution and determining collocation of execution of subqueries in a parallel database. The method according to the present invention is suitable for both subqueries that involve correlation and subqueries that do not.
The method according to the present invention reduces the system resources needed for processing a query by reducing the number of processes used when a partitioning key of any table involved in the query is specified by an equality to a constant, host-variable, IN-list, or any internal run-time computation. The method reduces the number of processes: (1) by reducing the number of nodes involved in the query; or (2) by combining multiple processes into one.
The method according to the present invention also uses the concept of xe2x80x9ccompatible partitioningxe2x80x9d in shared-nothing database systems to eliminate excess processing and communication for subqueries thereby improving response time and throughput.
In a first aspect, the present invention provides a method for determining locality for execution of subqueries for queries in a relational database management system, wherein said queries comprise an outer query and a subquery having a query-subquery operator and wherein partitioning columns for the query and subquery are provided, said method comprising the steps of: (a) determining if said outer query and said subquery are compatibly partitioned; (b) if said outer query and said subquery are compatibly partitioned then for each pair of partitioning columns in said outer query and said subquery determining an equivalence class for each of said columns in said pair; (c) determining if the partitioning column for said subquery belongs to the same equivalence class as the partitioning column for said outer query; (d) determining if said query-subquery operator comprises a selected operator; and and (e) if said steps (c) and (d) are true, then determining locality for said subquery so that said subquery is executable locally with respect to said outer query by the relational database management.
In a second aspect, the present invention provides a relational database management system for use with a computer system wherein queries are entered for retrieving data from tables and wherein partitioning columns and partitioning keys are provided, said system comprising: means for processing nested queries comprising an outer query and a subquery; means for determining locality of execution of said subquery including, (a) means for determining if said outer query and said subquery are compatibly partitioned; (b) means for determining an equivalence class for each column forming a corresponding pair of partitioning columns for said outer query and said subquery; (c) means for ascertaining if the partitioning column for said subquery belongs to the same equivalence class as the partitioning column for said outer query; (d) means for determining if said query-subquery operator comprises a selected operator; and (e) means responsive to said means for ascertaining and said means for determining said selected operator for determining locality of said subquery so that said subquery is locally executable with respect to said outer query by the relational database management system.
In a third aspect, the present invention provides a computer program product for use on a computer wherein queries are entered for retrieving data from tables, wherein said queries comprise an outer query and a subquery having a query-subquery operator and wherein partitioning columns for the query and subquery are provided, said computer program product comprising: a recording medium; means recorded on said medium for instructing said computer to perform the steps of, (a) determining if said outer query and said subquery are compatibly partitioned; (b) if said outer query and said subquery are compatibly partitioned then for each pair of partitioning columns in said outer query and said subquery determining an equivalence class for each of said columns in said pair; (c) determining if the partitioning column for said subquery belongs to the same equivalence class as the partitioning column for said outer query; (d) determining if said query-subquery operator comprises a selected operator; and (e) if said steps (c) and (d) are true, then determining locality for said subquery so that said subquery is locally executable with respect to said outer query by the relational database management.