Database queries request particular information to be retrieved from a database. The information that is retrieved comprises the results of answering the query. The Structured Query Language (SQL), SQL/XML (eXtensible Markup Language), and other proprietary forms of SQL that are based on the SQL standard are examples of database languages that are used to form a database query. Although examples and embodiments shown hereinafter primarily refer to using SQL and SQL/XML, it is understood the techniques described the present application are not limited to use with SQL and SQL/XML, and can be implemented with other compatible database languages.
A query does not specifically state how the results are to be retrieved from a database. A same set of results for a particular query can be retrieved from a database by using one of a multitude of semantically equivalent access plans, or query execution plans. A query optimizer is a component of the database management system (DBMS) that generates query execution plans for a query, and selects the optimal query execution plan with the estimated lowest performance cost. The cost of executing a sub-optimal plan can be orders-of-magnitude more than the cost of executing an optimal plan. Accordingly, it is crucial to query performance that the query optimizer is able to accurately estimate and choose the most-optimal plan for execution.
Queries that have join operations are examples of queries for which query performance can vary greatly depending on the query execution plan used. Such queries require information from two or more tables, and contain join predicates to combine the information from the two or more tables. For example, information from three tables are required to answer the following query:
SELECT A.Title, C.Title, B.Actor_nameFROM A, B, CWHERE A.title = B.title and B.Actor_name = C.Actor_name
Query plans generated for this query may differ in the join order used to combine the three tables. For example, one query execution plan employs a join order that first joins A and B using the A.title=B.title predicate to produce an intermediate result, and then joins the intermediate result with C using the B.Actor_name=C.Actor_name predicate. Alternatively, another query execution plan employs a different join order that first joins B and C using the B.Actor_name=C.Actor_name predicate, and then joins the intermediate result with A using the A.title=B.title predicate.
The cost of either query execution plan can be estimated in part from the cardinality of the respective tables that are being joined. The cardinality of a table is the number of rows in the table. In the above example, the cardinality of the tables contributes to a difference in costs for a query execution plan because the cardinality affects the cardinality of the intermediate result from the first join, which in turn affects the total cost of joining together the three tables. For example, if A has a small cardinality, B has a medium cardinality, and C has a large cardinality, joining A and B before joining C may perform better than joining B and C before joining A because joining A and B will likely produce fewer intermediate results than joining B and c due to the smaller respective cardinalities of A and B. The selectivity of the join predicates also affect the execution cost of joining together the three tables.
Because the cardinalities of the tables included in a query affect the estimated cost of different query execution plans, a query optimizer considers the cardinality of tables in estimating costs for different query execution plans. However, not all tables' cardinalities are readily available to the query optimizer, including, for example, tables that are generated by executing path-based table functions within the query.
A path-based table function is a function within a query that selects XML data from a collection of XML documents based on a path expression, and produces a path-based table construct. The path-based table construct, which is a set of rows in a similar form as an object-relational table, can be used within a query wherever a table would be employed. For example, a path-based table construct can be joined with another a path-based table construct, or joined with an object-relational table. Such path-based table functions include XMLTABLE and XPATHTABLE. 
In a previous approach, when a query includes an XMLTABLE row source, the query optimizer estimates the costs of query execution plans for the query by using a fixed value, for example, 1000, as the default cardinality for the row set produced by the XMLTABLE row source. Accordingly, if a query required performing multiple join operations on three row sets produced by XMLTABLE row sources, the query optimizer would use the same default cardinality for all three row sets when estimating the costs the different query execution plans, regardless of the true cardinality of the row sets. Thus, because the same default cardinality is used in estimating the costs of the different join orders, the cardinalities of row sets produced by the XMLTABLE row sources may be ineffective as factors for estimating costs.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.