The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Relational database management systems store information in tables, where each piece of data is stored at a particular row and column. Information in a given row generally is associated with a particular object, and information in a given column generally relates to a particular category of information. For example, each row of a table may correspond to a particular employee, and the various columns of the table may correspond to employee names, employee social security numbers, and employee salaries.
A user retrieves information from and submits updates to a database by interacting with a database application. The user's actions are converted into a query by the database application. The database application submits the query to a database server. The database server responds to the query by accessing the tables specified in the query to determine which information stored in the tables satisfies the queries. The information that satisfies the queries is retrieved by the database server and transmitted to the client application. Alternatively, a user may request information directly from the database server by constructing and submitting a query directly to the database server using a command line or graphical interface.
Queries submitted to the database server must conform to the rules of a particular query language. One popular query language, known as the Structured Query Language (SQL), provides users a variety of ways to specify information to be retrieved. In SQL and other query languages, queries may include inner query blocks. For example, the query
SELECT T1.xFROM table1 T1, parts PWHERE P.y = T1.y AND P.z = ‘MED BOX’  AND T1.quantity < (SELECT AVG (T2.quantity)    FROM Table2 T2    WHERE T2.partkey = P.partkey)  AND P.quantity < (SELECT AVG (T3.quantity)    FROM Table3 T3    WHERE T3.serialnum = T1.serialnum);has an outer query block that has two inner query blocks, each of which is a subquery:                SELECT AVG (T2.quantity) FROM Table2 T2 WHERE T2.partkey=P.partkey and        SELECT AVG (T3.quantity) FROM Table3 T3 WHERE T3.serialnum=T1.serialnum.        
Determining costs for queries with multiple subqueries is difficult. The difficulty arises because each subquery reduces or “filters” the number of rows on which subsequent subqueries will be applied. The total cost of applying a subquery is a function of the number of rows to which the subquery is applied. That is, even if a cost “function” can be determined for each subquery, if it is not known on how many rows the subquery will be applied, then the cost of the subquery in the context of the query cannot be determined.
An approach to estimating the cost of a subquery is the heuristic approach. In the heuristic approach, a set of rules, or “heuristics,” are applied to the subqueries and the data on which the subqueries will be applied. The heuristics estimate the number of rows that will be “filtered” by the subquery. The estimate is used to estimate the cost for subsequent subqueries. Heuristics are again be used on the subsequent subqueries to determine how many rows each of the subsequent subqueries will filter. A problem with this approach is that the heuristic-based estimate of row filtering are made based on general rules which may not be applicable to the particular query, its subqueries, or the data to which the query is being applied. Therefore, there will be error involved with each heuristic estimation of subquery filtering, and these errors will propagate, build upon one another, and may even cause exponential increases in error.
Based on the discussion above, there is clearly a need for techniques that overcome the shortfalls of the approaches discussed above.