1. Field of the Invention
This invention relates generally to database management systems and, more particularly, to efficient evaluation of SQL statements processed in relational database management systems.
2. Description of the Related Art
Information is frequently stored in computer processing systems in the form of a relational database. A relational database stores information as a collection of tables having interrelated columns and rows. A relational database management system (RDBMS) provides a user interface to store and retrieve the information and provides a query methodology that permits table operations to be performed on the data. One such RDBMS is the Structured Query Language (SQL) interface, which is specified by standards adopted by the American National Standards Institute (ANSI) and the International Standards Organization (ISO) following original development work by the International Business Machines (IBM) Corporation. The SQL interface permits users to formulate operations on the data tables either interactively, or through batch file processing, or embedded in host languages such as C, COBOL, or the like.
In particular, SQL provides table operations with which users can request database information and form one or more new tables out of the operation results. Data from multiple tables, or views, can be linked to perform complex sets of table operations with a single statement. The table operations are specified in SQL statements called queries. One typical SQL operation in a query is the xe2x80x9cSELECTxe2x80x9d operation, which retrieves table rows and columns that meet a specified selection parameter. Another operation permitted by SQL is the xe2x80x9cJOINxe2x80x9d operation, which concatenates all or part of two or more tables to create a new resulting table. For example, a query might produce a table that contains the names of all supervisory employees who live in a given city, and might do so by specifying a SELECT operation to retrieve employee names and resident cities from one table, and then performing a JOIN of that data after a SELECT operation to retrieve employee names and job titles from another table.
An SQL query generally includes at least one predicate, which is an SQL expression that can assume a logical value of TRUE, FALSE, or UNKNOWN. A predicate typically either specifies a data range, tests for an existence condition, tests for equivalence, or performs a similar table comparison operation. SQL queries and their resulting table operations can be nested through several levels of predicates such that a higher nested predicate, or level of operation, cannot be evaluated until after a lower level predicate, or operation, has been evaluated. A lower level of SQL operation in an SQL statement is generally referred to as a subquery. An example of an SQL query is provided below in Table 1:
SELECT A.y, sum (B.y)
FROM A, B
WHERE A.x=B.x
GROUP BY A.y;
Those skilled in the art will understand that the notation xe2x80x9cA.yxe2x80x9d indicates the y column for all rows of Table A.
When a query is input to an RDBMS, it is received by a query processor that puts the query through a query optimization process. The optimization is generally performed by a query optimizer of the RDBMS. During query optimization, the SQL query is parsed, or rewritten, into an RDBMS internal representation referred to as the query graph model (QGM). The QGM is a high-level, graphical representation of the input query in which boxes represent relational operations and arcs connecting the boxes represent quantifiers that reference tables. A QGM representation of the Table 1 query is shown in FIG. 1. The three boxes represent the three subqueries that make up the query of Table 1 and indicate operations that are executed on incoming data, which flows from bottom to top. The arcs represent quantifiers that in some way restrict the information flowing into the respective boxes. The QGM representation of FIG. 1 can best be understood by matching each QGM box from the bottom up. Thus, the xe2x80x9cselect . . . from A, Bxe2x80x9d subquery corresponds to the two arcs A and B, respectively, and the xe2x80x9cA.y, sum(B, y)xe2x80x9d is indicated by the quantifier arc. The subquery xe2x80x9cwhere A.x=B.xxe2x80x9d corresponds to the bottom QGM box and the xe2x80x9cgroup by A.yxe2x80x9d is represented by the middle box. The final SELECT box in FIG. 1 represents the retrieval of data satisfying the query.
In general, each box of the QGM includes the predicates applied by the box relational operation and is associated with characteristics such as an input or output order specification where appropriate, a xe2x80x9cdistinctxe2x80x9d flag, and the like. A basic set of QGM boxes from which queries can be represented includes SELECT, GROUP BY, and UNION. Join operations are represented by a SELECT box with two or more input quantifiers, whereas the ORDER BY operation is represented by a SELECT box with an output order specification. As part of the query optimization processing, the original QGM can be transformed into a more efficient QGM using techniques such as conversion heuristics, predicate push-down, view merging, and subquery-to-join transformation. The transformed QGM is semantically equivalent to the original QGM, but can be more efficiently evaluated by the query processor.
After the QGM is generated and optionally transformed, cost-based optimization is performed in which the QGM is graphically traversed and a query execution plan (QEP) is generated for evaluation. The RDBMS query processor interprets the QEP and thereby executes it, retrieving the requested data. The QEP can be viewed as a dataflow graph of operators, where each node of the graph corresponds to a relational operation such as a JOIN or a relatively low level operation such as a SORT. Each operator consumes one or more input records (that is, tables) and produces an output set of records (more tables). These tables will be referred to generally as output streams. A QEP representation of the Table 1 query is shown in FIG. 2.
FIG. 2 shows that a QEP graphical representation includes circular or oval objects that represent operators, connected by arcs that represent information streams. Thus, in FIG. 2 a table scan operator that acts on Table A produces information that is fed to a xe2x80x9cjoinxe2x80x9d oval, along with information from a table scan operator that acts on Table B. After the JOIN operator, which generates the output corresponding to the xe2x80x9cwhere A.x=B.xxe2x80x9d clause of the SQL query, the information stream flows into the xe2x80x9csortxe2x80x9d oval, and then the information stream flows into the xe2x80x9cgroup byxe2x80x9d oval.
Each stream in a QEP has an associated set of properties. Properties are what characterizes the information that is being moved between operations. Examples of properties include the columns that make up each record in the input stream, the set of predicates that have been applied to the stream, and the order of the stream. Each operator in a QEP determines the properties of its output stream. The properties of an operator""s output stream are a function of its input stream and its applied operation. For example, a sort operator passes on all the properties if its input stream unchanged except for the order property and cost.
A QEP is generated, or built, from the bottom up by the query optimizer, operator by operator, using the QGM as a guide. A set of properties are computed for each operator, and as the QEP is built, the optimizer matches the QEP""s properties against requirements on those properties. The requirements can arise from the SQL query or from the characteristics of operators, such as JOIN. For example, a query with an ORDER BY clause results in an xe2x80x9corderxe2x80x9d requirement. If a QEP did not already satisfy the order dictated by the ORDER BY clause, then a sort operation would be added to the QEP to satisfy the ordering requirement of the ORDER BY operator.
As a QEP is built, different plan alternatives for operators are generated and compared. An analytical cost model is typically used to estimate the execution time of a QEP and thereby reflect the resource cost of the QEP. The QEPs are compared on the basis of resource cost using known techniques, and the more costly QEPs are pruned, or discarded, from further consideration. In this way, the output of the query optimizer should be the least costly QEP.
During the pruning process, only QEPs with similar properties are compared to each other. For example, if P1 and P2 are two QEPs being compared, and if xe2x80x9c less than =xe2x80x9d is the symbol meaning that a first property is less general or equivalent to a second property, then P2 prunes P1 (P2 is kept and P1 is discarded) if it is true that P2.cost less than =P1.cost, and if for every property x, P1.x less than =P2.x. That is, P1 is pruned if it costs more than P2 (P2.cost less than =P1.cost) and if all of its properties are less general (P1.x less than =P2.x). As a further example, if P1 was sorted on column x, and if P2 was sorted on columns x and y, then the order of P1 is less than (less general than) the order of P2, written as: P1.order less than =P2.order. That is, the statement xe2x80x9cP1.order less than =P2.orderxe2x80x9d is true because the order property of P1 is less general than the order property of P2, so it cannot be used to satisfy an order requirement for x and y. It should be noted that the definition of xe2x80x9c less than =xe2x80x9d varies from property to property.
In an MPP system with a shared-nothing architecture, a collection of processors comprise nodes of a network and are used to execute queries in parallel. In such a system, a given query is broken up into subtasks and all subtasks are executed in parallel. Nodes in an MPP, shared-nothing system are typically connected by high-speed communication lines over which subtasks are coordinated across nodes. That is, the network is used to coordinate subtasks across nodes and also to exchange data between nodes. Each node has its own memory and disk. To permit such parallel operation, the data tables are horizontally partitioned across nodes. Access to a partition is given only through the node that manages that partition.
The rows of a table are typically assigned to a node by applying some deterministic partitioning function to a subset of the table""s columns. These columns are called the partitioning key of the table. A simple example illustrating how a table might be horizontally partitioned is shown in FIG. 3, in which a table called xe2x80x9cTable Axe2x80x9d has been partitioned. In particular, Table A has been partitioned on column A.x, because rows are assigned to node 0 or to node 1 depending on the value in column x of Table A. The partitioning function assigns the rows of Table A to node 0 or to node 1. Rows with even values in A.x are assigned to node 0, while rows with odd values in A.x are assigned to node 2. Thus, the partitioning key is A.x. A partitioning function is typically based on a simple hashing scheme such as this.
Optimizing SQL queries in an MPP, shared-nothing architecture system in which subtasks are executed in parallel is more complicated as compared to simpler, single processor systems in which tasks can be executed serially. One consideration for the MPP system is that the query processor must determine how to break the QEP into subtasks and then must determine how to merge the results of those subtasks for delivery to an end user. As noted above, as the QEP is built, properties are computed for each operator and those properties are associated with various requirements. One of the properties conventionally used is called the ordering property. This property relates to the order in which the rows of a table are maintained. It should be understood that the ordering property is changed during query evaluation, such as by sorting operations. It is known to provide a query optimizer that can consider changes in the ordering property in evaluating QEPs.
Because MPP, shared-nothing systems are more complicated in terms of query evaluation and execution, it would be advantageous to provide a query optimizer that can consider properties in addition to table order when formulating QEP selection. This would promote more efficient pruning of alternative QEPs.
From the discussion above, it should be apparent that there is a need for a relational database management system that efficiently evaluates complex query statements in an MPP, shared-nothing environment by careful consideration of query operator properties in addition to table ordering. The present invention fulfills this need.
The present invention provides a relational database management system (RDBMS) with a query processor that uses the partition property to perform QEP pruning based on considerations derived from the partition property. The query processor also makes use of partition classes that are designated xe2x80x9cinteresting classesxe2x80x9d to perform preoptimization planning and query pruning, and to perform look-ahead partitioning based on partition classes that are identified as being of interest to future operations. In this way, the relational database management system more efficiently evaluates complex query statements in an MPP, shared-nothing environment.
In one aspect of the invention, the partition property maintained by the query processor includes identification of the nodes over which tables are partitioned and includes information about the function used to assign table rows to nodes, with optional indicators of special circumstances such as an indication that all rows of an intermediate result table will be sent to a single node or an indication that a copy of each table row was sent to each partition node. The query processor makes use of the partition property information to compare alternative QEPs for pruning such that the query processor will prune the QEP having the less general partition property value and having the greater execution cost.
In another aspect of the invention, processing prior to QEP generation permits the query processor to look ahead and consider interesting partition classes, which are defined to be any partitioning of the data that may prove useful for executing a query. In this way, the query processor can determine if a query operator has a partition property that has been designated an interesting partition property, such that the QEP having that query operator may be retained, though it might otherwise be pruned. Properties designated to be interesting are those properties determined to be potentially of good use in executing a query so that overall efficiency of table operations may be increased. Similarly, looking ahead may be used to control partitioning of a table to consider future table operations in determining where to send the output of table partitioning, again increasing efficiency of table operations.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiment, which illustrates, by way of example, the principles of the invention.