In a database management system, a function or procedure is a set of statements in a procedural language (e.g., PL/SQL, C, or Java) that is executed as a unit to solve a specific problem or perform a set of related tasks. A function generally differs from a procedure in that the function returns a value to the environment in which it is called. Many database management systems permit users to create and call “table functions” which are a class of functions that produces a set of rows or data objects as output. Table functions can be viewed as a “virtual table” and may be used in the context of a FROM clause (a construct in the database query language SQL) to iterate over individual rows. Further, other SQL or database operations such as selection (e.g., WHERE clause) or aggregation (e.g., GROUP BY clause) can be performed on these rows.
One reason to utilize table functions is to allow users to implement new functions that perform complex transformations on a set of rows or which generates a set of rows, and which are not pre-existing or pre-defined system functions. Another reason for utilizing table functions is to allow users to define functions that operate upon non-native object types in a database system. Non-native object types may be created, for example, using the “CREATE TYPE” command in the Oracle 8i database management system from Oracle Corporation of Redwood Shores, Calif.
FIG. 1 depicts one approach to implementing tables functions in the context of an ETL (Extraction-Transformation-Load) process for data warehousing. The ETL process extracts data from an online transaction processing (OLTP) system 102, performs a sequence of transformations upon the extracted data, and thereafter loads that transformed data into a data warehouse 108. The transformation steps can be performed by table functions, such as table functions 104 and 106 as shown in FIG. 1. In conventional database systems, a table function cannot accept a pipelined stream of input data; thus, data is usually “staged” before being fed to a table function. In the approach of FIG. 1, the output of a first table function 104 is staged into stage 105 before it is processed by the immediately following table function 106. In effect, the entire output of table function 104 is materialized, possibly into a table or “collection type” instance, before the next transformation table function 106 receives any input data to begin its processing. A collection type describes a data unit made up of a number of elements having a particular data/object type.
Staging forms a blocking operation that presents a significant source of expense and inefficiency to the database system. If the entire output of a first transformation must be materialized before it is passed to the next transformation, excessive memory and/or disk requirements may be imposed because of the staging. Moreover, the overall response time of the ETL operation is affected because each downstream table function remains idle until its corresponding upstream table function has completely constructed the entire set of data that it is to produce. If there is a chain of table functions in the ETL process with staging between each intermediate transformation, then multiple levels of stage-based delays will be imposed. Note that this problem is not limited solely to ETL processes, but exists for many other types of database processes utilizing table functions and staging.
One embodiment of the present invention addresses this problem by implementing input pipelining for table functions. With input pipelining, data does not have to be completely materialized before it is consumed by a table function. Instead, in one embodiment, a producer of data creates a stream of data that is immediately utilized by a table function consumer of that data. This permits a chain of table functions in which a producer table function generates a pipelined output of data, which is consumed by a consumer table function that accepts a pipelined input of the data. Another aspect of an embodiment is directed to parallel processing of table functions, in which the set of work operated upon by a table function is sub-divided into smaller portions that are assigned to a plurality of database execution slaves. Yet another aspect of an embodiment of the invention is an integration between pipelining and parallelized execution for table functions.