Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems. From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database.
Furthermore, significant development efforts have been directed toward query “optimization,” whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query. A query optimizer typically generates, for each submitted query, an access plan, which typically incorporates low-level information telling the database engine that ultimately handles a query precisely what steps to take (and in what order) to execute the query. In addition, the access plan may select from different access methods (e.g., table scans or index accesses), based upon the available resources in the system.
A query optimizer typically creates multiple potential access plans, and selects the best among those potential access plans based upon the “cost” of each plan. The cost of each plan represents the amount of resources expected to be utilized by the plan (typically expressed in the expected amount of time to execute), and as such, selecting the plan with the lowest cost typically results in the most efficient (and quickest) execution of a query.
Typically, an optimizer may rely upon the statistical attributes or statistics of tables referenced in a database query to compare the access plans and identify the best plan under the runtime conditions. Statistical attributes may be, for example, the number of records in a table, the cardinality of a column of a table, selectivity of a predicate of the database query (i.e., expected number of records that will be returned based upon that predicate), etc. Statistical attributes are generally saved in physical tables and/or indexes thereof. Statistics are important during optimization because the optimizer typically uses the statistics to evaluate the cost of each access plan. For example, if a database query required a join operation between two tables, based upon an estimate of the number of records in each table to be a joined, an access plan where the table with the greater number of records is to the left of the join (or inner table) may carry a higher cost than a second access plan where the table with the greater number of records is to the right of the join (or outer table). Thus, when the table referenced in a database query is a physical table, the statistics saved in the physical table are generally used by the optimizer during optimization to cost the different access plans.
However, not all tables referenced in a database query are physical tables. As generally illustrated in FIG. 4, a user query 104 in a user application 106 may reference one or more physical table(s) 108 and/or logical table(s) 110. A logical table or logical file is generally a query packaged to look like a physical table or physical column. Although logical tables are generally not physical tables, logical tables are based upon physical tables. A logical table may be, for example, a SQL view, a SQL common table expression, a SQL subquery, a partition table, etc.
Logical tables also vary in complexity. Typically, the more groupings, unions, fetch first N rows, and/or other database functionality in a logical table (or query represented by a logical table), the higher the complexity of the logical table. Generally, complex logical tables (or queries represented by the complex logical tables) are materialized into a temporary table during execution of the database query referencing the logical table, and the temporary table may be used to generate the results of the database query (e.g., the temporary table may be joined to another table in the database query).
Logical tables may reference more logical tables to any depth (e.g., logical table1 references logical table2 which references logical table3 which references logical table4 and so forth), further increasing the complexity of a logical table. As illustrated in FIG. 4, logical table 110 may reference one or more physical table(s) 112 and/or logical table(s) 114. Similarly, logical table 114 may reference one or more physical tables and/or logical tables and so forth. Thus, result 116 of the user query may be based upon any number of physical and/or logical tables.
Optimizing database queries with less complex logical tables for which a temporary table does not need to be materialized may be done using conventional techniques (e.g., merging the logical table with the database query). However, optimizing a database query that references a logical table for which a temporary table needs to be materialized is often problematic. In particular, as temporary tables do not physically exist during optimization unlike physical tables, statistics are generally not available (and/or if available, are typically based upon an arbitrary guess). Therefore, in selecting an access plan, the optimizer is generally not able to consider the statistical attributes of any temporary table that will be materialized from the logical table. As a result, the optimizer may not be able to determine during optimization, for example, if an access plan that places the logical table to the left of a join costs more than an access plan that places the logical table to the right of a join as the optimizer generally will not have any statistical information such as the number of records the temporary table of the complex logical table will contain. Furthermore, as a logical table may reference additional tables (e.g. another logical table), for which additional temporary tables may need to be materialized during execution, the optimizer may not be able to consider the statistical attributes of these additional temporary tables either during optimization.
In an effort to optimize database queries that reference complex logical tables without statistics (or statistics based upon a guess) about a temporary table that will be materialized, the optimizer generally optimizes the database query as two separate queries (i.e., the database query is treated as one query and the query represented by the logical table is treated as another). However, as the optimizer typically does not have any statistical data about the temporary table that will be materialized from the logical table, the optimizer generally places the two queries together in a functional manner (e.g., arbitrarily placing the logical table to the right of a join), but not necessarily the best manner under the runtime conditions. Typically, this approach results in the selection of suboptimal access plans and poor performing database queries.
A need therefore exists for improving the optimization of database queries, in particular, improving the optimization of database queries that reference logical tables for which temporary tables will be created during the execution of the database queries, thus, resulting in improved selection of access plans and improved query performance.