Database applications interact with a database server by submitting to the database server commands that cause the database server to perform operations on data stored in a database. A database command that is sent from a database application to a database server contains the “original statement” of the database command. For the database server to process the commands, the commands must conform to a database language supported by the database server. One database language supported by many database servers is known as the Structured Query Language (SQL).
Database Command Compilation
When a database server receives the original statement of a database command from a database application, the database server must first determine which actions should be performed in response to the database command, and then perform those actions. The act of preparing for performance of those actions is generally referred to as “compiling” the database statement or command, while performing those actions is generally referred to as “executing” the database statement or command.
During the compilation of a query statement (e.g., a SQL statement), the database server may perform a significant amount of preliminary work for the statement, such as parsing, semantic analysis, and query plan generation. One set of information generated during compilation is the execution plan for performing the operations specified by the query statement.
One method of representing an execution plan is a row-source tree. At execution, traversal of a row-source tree from the bottom up yields a sequence of steps for performing the operation(s) specified by the database statement. A row-source tree is composed of row-sources. During the compilation process, row-sources are allocated, and each row-source is linked to zero, one, two, or more underlying row-sources. The makeup of a row-source tree depends on the database statement and the decisions made by a database server query optimizer during the compilation process. Typically, a row-source tree is comprised of multiple levels. The lowest level, the leaf nodes, access rows from a database or other data store. The top row-source, the root of the tree, produces, by composition, the rows of the query that the tree implements. The intermediate levels perform various transformations on rows produced by underlying row-sources. A row-source typically includes data about the execution plan, such as SQL operation, position, object identification, and number of rows processed by a single row-source within an execution plan.
The row-source tree representation of execution plans is described in detail in U.S. Pat. No. 5,857,180, entitled “Method and apparatus for implementing parallel operations in a database management system”, issued to Hallmark et al. on Jan. 5, 1999, the entire contents of which are incorporated herein. Hallmark also describes a “row-source approach” for parallelizing the operations required by an SQL command by parallelizing portions of the execution plan of the query.
Parallel Execution of a Query
Sequential query execution uses one processor and one storage device at a time. In contrast, parallel query execution uses multiple processes to execute, in parallel, suboperations of a query. For example, virtually every query execution includes some form of manipulation of rows in a relation, or table, of the database management system (DBMS). Before any manipulation can be done, the rows must be read, or scanned. In a sequential scan, the table is scanned using one process. Parallel query systems provide the ability to break up the scan such that more than one process can get involved in performance of the table scan.
Various techniques have been developed for parallelizing queries. Such techniques typically rely on an underlying query processing model. For example, one model (a “row-source model”) for parallelizing queries is described in U.S. Pat. No. 5,857,180, which was mentioned above. According to the row-source (iterator) model of SQL execution, data flows through the SQL operations modeled by a row-source in the form of row-vectors. Specifically, a parallel plan is built on the Query Coordinator (QC). The parallel plan is subdivided into sub-plans or sub-trees, each called a DFO (Data Flow Object), each of which are scheduled and run in parallel on a set of parallel execution slaves.
Table Functions
Table functions are functions which produce a set of rows as output. The set of rows produced by a table function may be treated as a “virtual” table. Thus, any database statement used for manipulating data in a table may be used to manipulate data produced by a table function.
A database command that includes a table function is referred to herein as a “table function statement”. Table function statements, and the execution thereof, are described in greater detail in U.S. patent application Ser. No. 09/939,311, entitled “METHOD AND SYSTEM FOR PARALLEL EXECUTION OF TABLE FUNCTIONS”, filed on Aug. 24, 2001, the content of which is incorporated by this reference in its entirety for all purposes as if fully disclosed herein; and in U.S. patent application Ser. No. 09/938,982 entitled “METHOD AND SYSTEM FOR PIPELINED DATABASE TABLE FUNCTIONS”, filed on Aug. 24, 2001, the content of which is incorporated by this reference in its entirety for all purposes as if fully disclosed herein.
Table functions were introduced into database systems to support user-defined transformations inside the database server. Thus, a table function is a non-native function registered with and stored inside the database (e.g., by an application or end user). Herein, the terms “table function” and “user-defined table function” are used interchangeably, with the only restriction being that the function is used in a FROM clause of a query and have an argument which specifies a stream of rows from a sub-query (e.g., a ref-cursor argument), as shown by example hereafter. Because table functions are important in the transform phase of an ETL (Extract Transform Load) process of data-warehousing, it is important to support parallelism as a means of making table functions performant and scalable.
One approach to parallelized processing of table functions enforces partitioning of the input stream to table functions, so that each slave process working on the table function receives a “predefined” subset of the input data. The input stream to a table function corresponds to the output stream of the operation below the table function (e.g., the table functions “ref-cursor”) in the row-source tree.
Because the logic implemented in a table function is user-defined, it is typical that the table function developer defines the correct and performant partitioning strategy for the function's input stream. Three different partitioning strategies are commonly supported in database systems: ANY, HASH and RANGE.
ANY specifies that the table function is indifferent as to how the input stream to the table function is repartitioned among various table function slaves.
HASH(x) specifies that the input stream to the table function should be repartitioned by a suitable (server-defined) hash function on column x of the ref-cursor.
RANGE(x) specifies that the input stream to the table function should be repartitioned by range on column x of the ref-cursor.
A table function's partitioning method is typically enforced using a data redistribution operation (e.g., a “Table Queue” or “PX SEND”) between the table function and the operation below the table function in the row-source tree. In other words, the execution plan for a database statement that includes a table function is constructed to include a redistribution of the data stream output from the table function's ref-cursor, to send subsets of the output data to each of the table function slaves.
For example, a parallel table function row-source (which is represented as a COLLECTION ITERATOR row-source in the execution plan) always had a redistribution Table Queue below. This redistribution operation enforced the redistribution strategy specified by the PARALLEL PARTITION BY clause of the table function, on the rows flowing up from the slaves executing the top row-source corresponding to the ref-cursor query block to the slaves executing the table function (i.e., collection iterator) row-source.
Consider the following simple table function, F, which requires a hash partitioning on one attribute (col1) of its ref-cursor (C_cursor).
create functionF(C_cursor refcursor_type)return tab_of_recordsparallel_enable(partition C_cursorby HASH(col1)) isbegin...return (record(col1, col2, col3));end.Thus, function F is to be parallelized by HASH partitioning the rows from cursor “C_cursor” on column “col1.”
Now consider the following invocation of the function F:
SELECT col1, f_sum_col2FROMTABLE(F(CURSOR(SELECT col1, sum(col2) as sum_col2FROM tabGROUP BY col1))
In one approach to such a database statement, the group-by aggregation is typically parallelized using HASH redistribution to slaves based on the GROUP BY columns. Hence, the execution plan for this parallel query has three DFOs, as represented below.
PX COORDINATOR|(third DFO follows)COLLECTION ITERATOR (function F)|(second DFO follows)PX SEND (hash on col1)GROUP BY AGGRN|(first DFO follows)PX SEND (hash on col1)PX BLOCKTABLE SCAN (tab)This plan requires two redistributions of data through table queues (i.e., PX SEND), and communication of the redistributed data to other sets of slave processes. Of course, the communications between producer and consumer slave processes, which may execute on different system nodes, require system resources. Further, the more a database system parallelizes its operations, generally, the more complex are the server's query execution plans (e.g., the execution plan includes relatively many redistribution table queues). However, minimization of communications between producer and consumer processes is desirable.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.