The present invention relates generally to database management systems. More specifically, the present invention is related to adaptive techniques functioning to dynamically reorder pipeline joins according to data characteristics observed during query execution.
A database management system (DBMS) is typically used for storing, querying, and providing information for various commercial, industrial, technical, scientific and educational applications. As can be appreciated, the volume of information stored in a DBMS increases as a function of the number of users accessing such information by query, and the amount of computing resources required to manage a typical DBMS increases as well.
FIG. 1 shows a conventional data processing system 10, such as may be embodied in a computer, a computer system, or similar programmable electronic system, and can be a stand-alone device or a distributed system as shown. The data processing system 10 may be responsive to a user input 11 and may comprise at least one computer 13 having a display 15, and a processor 17 in communication with a memory 19. The computer 13 may interface with a second computer 21 and a third computer 23 via a network 25, such as a LAN, a WAN, a wireless network, and the Internet. The computer 13 may operate under the control of an operating system 31, and may execute various computer software applications, such as a DBMS 33 in communication with a database 35 located in a database storage device 37, for example.
The DBMS 33 may use a query optimizer (not shown) to generate an optimal query execution plan during a query compilation phase. Query plan optimization may take into account such factors as platform capabilities and query content information. The query optimizer functions to select the most efficient method of executing each database query inputted to the DBMS 33. The output of an optimization process may be referred to as a query execution plan and may be represented by a query tree. The query execution plan typically provides low-level information for a database engine that includes: (i) operating steps for performing the query, (ii) the order in which the steps are to be executed, and (iii) an estimate of the time required to run the query execution plan. The result of the execution of the database query may be stored as a result set.
In a conventional DBMS, once a query execution plan is generated, the query execution plan typically remains static and unchanged during execution. An optimizer may use a cost-based model to optimize a query plan, where the quality of a query execution plan depends heavily on the cost model and related statistics. However, the statistics may be unreliable or insufficient, since the statistics are approximations of original data, and the statistics may not be accurate or remain up-to-date. Furthermore, there are many situations in which no statistics are available and some defaults or assumptions have to be used, an action that may deteriorate the accuracy of the cost model.
For example, an optimizer may assume uniformity for value distributions, or may assume that column values are independent. Such an assumption can result in incorrect selectivity estimates if the values are, in reality, highly correlated. When the assumptions used by an optimizer are not supported by the underlying data, the resulting errors in estimating the characteristics of the base data can propagate exponentially while estimating the cost of an execution plan. Thus, although an execution plan generated by an optimizer may be the optimal plan based on current statistics and assumptions, the execution plan may be far from optimal for the actual input data set.
The following query example is provided to illustrate some of the problems with static optimization.
Query A:SELECT o.name, a.driverFROM Owner o, Car c, Demographics d, Accidents aWHEREc.ownerid = o.id AND o.id = d.ownerid ANDc.id = a.carid AND(c.make=‘Chevrolet’ OR c.make=‘Mercedes’) ANDo.country1 = ‘Germany’ ANDd.salary < 50000;
The objective of Query A is to examine accidents in Germany for certain makes of cars. Since the query involves only two makes, “Chevrolet” and “Mercedes,” a likely query execution plan would have the “Car” table as an outer-most table with an index scan on “make.” Because there are relatively few “Chevrolet” cars sold in “Germany,” the “Owner” table (with the predicate “country 1=‘Germany’”) has the better filtering power when processing “Chevrolet” cars. Therefore, it appears to be more efficient to place this table before the “Demographics” table in the corresponding query execution pipeline.
In comparison, when scanning “Mercedes” cars, it would be better to have the “Demographics” table before the “Owner” table, since the former would have a better filtering effect than the predicate on “Owner” in the case of luxury cars. In this query, any static or fixed order for the “Demographics” and “Owner” tables would likely be sub-optimal for the entire data set. To achieve the optimal table order, the query execution plan should be monitored and the orders of the tables should be dynamically switched, according to the data being processed. The example of Query A illustrates the need for allowing the run-time system of a DBMS to adaptively process queries according to actual data characteristics.
Applications such as On-Line Transaction Processing (OLTP), Customer Relationship Management (CRM), and certain Web applications requiring low latencies frequently utilize pipelined query execution plans. The preferred and predominant join method used by a typical commercial DBMS for building pipelined query execution plans is a “nested-loop join,” which is particularly suited for smaller memory consumption. It is known in the present state of the art, for example, to generate multiple execution plans for a single query as a method of optimizing a query execution plan. This multiple query plan technique may either select the best complete query execution plan at the beginning of execution, or choose from embedded alternative sub-plans in the middle of execution at a materialization point. To deal with the uncertainties of run-time resources and parameter markers in queries, multiple query execution plans may be generated at compile time and, when certain unknowns become known at run time, a decision tree mechanism may select one of the alternate query execution plans.
In a method using adaptive data partitioning (ADP), for example, the source data may be divided into regions to be executed by different but complementary query execution plans, with a final phase that “stitches-up” the results from each partition to derive the complete result. In an alternative optimizer plan-based approach, the optimizer may be called to generate a new plan when inconsistencies are detected. The Progressive query OPtimization (POP) method detects cardinality estimation errors in mid-execution by comparing the optimizer's estimated cardinalities against the actual run-time count of rows at natural materialization points (e.g., sorts) in the plan. If the actual count is outside a pre-determined validity range for that part of the plan, a re-optimization of the current plan is triggered. The DB2 LEarning Optimizer (LEO), in contrast, waits until a plan has finished executing to compare actual row counts to the optimizer's estimates. It learns from mis-estimates by adjusting the statistics to improve the quality of optimizations in future queries. However, LEO does not address how to improve the performance of the current query in mid-execution.
In comparison to optimizer plan-based techniques, row-routing techniques such as the Eddies framework do not require a complex compile-time optimizer. Instead, a run-time optimizer routes each row independently through a sequence of join operators. Since each row may be routed differently, information about which joins have been completed is usually maintained with each row so as to avoid the complexities and overhead associated with a heavy-duty compile-time optimizer. Other techniques used to process stream joins are another form of row routing. For example, query scrambling adapts to data delays from remote data sources by attempting to let other parts of the query progress. “Xjoin” extends the symmetric hash join by allowing parts of the hash tables to be moved to secondary storage. When input rows are unavailable, XJoin uses the rows from secondary storage to continue probing hash tables in memory.
More recently, commonly-assigned U.S. Patent Application 2005/0060292 “Method and System for Dynamic Join Reordering” discloses a database engine and a system running a database engine that utilize a dynamic join reordering feature to change the order of two or more join operations while a query is executing. Change decision is based on fan-out and fan-in rates. In yet another technique, adaptive caching adjusts cache placement and cache maintenance dynamically in a streaming environment. Adaptive ordering of stream filters focuses on adaptively processing data streams. Dynamic plan migration changes join plans dynamically for stream join plans. Most of these stream-processing techniques are based on the symmetric hash join algorithm. Bulk join methods like symmetric hash joins require a huge amount of memory, thus being prohibitive for large scale OLTP systems, where pipelined plans are mandatory and the only low-resource pipelined join method is indexed nested-loop join (or index merge). Supporting indexed nested-loop join is crucial in order to bring adaptation techniques to commercial database systems.
Both plan-based and row-routing schemes suffer from the shortcoming that special techniques are required to deal with the incorrect introduction of duplicates due to run-time changes in the execution strategy. Plan-based re-optimization methods either delay producing results or keep track of the results that have been output. One technique suggested in a variant of POP, for example, saves all row IDs (RIDs) returned to the user by a pipelined plan. The rows produced by any newly re-optimized plan must then be anti-joined with those saved RIDs to eliminate duplicate results. To prevent duplicate results caused by overlapping processing stages in XJoin, Urhan et al. proposed a row-marking algorithm based on timestamps. State Modules (SteMs) also use a similar timestamp technique to prevent duplicates.
From the above, it is clear that there is a need for a method to dynamically reorder a pipelined join plan in mid-execution without having to re-invoke an optimizer to generate new query plans, where the method avoids the introduction of duplicate results caused by join order changes.