1. Field of the Invention
The present invention relates generally to a data processing system and in particular to query optimization in a database management system. More specifically, the invention relates to the progressive refinement of a federated query plan during query execution.
2. Description of the Related Art
Database Management Systems (DBMS) perform query plan selection by mathematically modeling the execution cost of candidate execution plans and choosing the cheapest query execution plan (QEP) according to that cost model. A cost model is a mathematical model that determines the execution cost of a query execution plan. Examples of execution costs of a query execution plan are commonly determined by I/O costs, CPU costs, and communication costs. A QEP is a functional program that is interpreted by the evaluation engine to produce the query result. A query execution plan outlines how the DBMS will run a specific query; that is, how the data will be found or written. For example, an important decision might be whether to use indexes and, if there are more indexes, which of these will be used. The cost model requires accurate estimates of the sizes of intermediate results of all steps in the QEP. Intermediate results are the results of a partial execution of a query execution plan. Intermediate results are communicated between the current query execution of the query execution plan and the next query re-optimization of the query execution plan. Furthermore, intermediate results also are communicated between any subsequent query execution of the query execution plan and another round of re-optimization of the query execution plan. A partially executed query execution plan is a query execution plan that is executed up to a checkpoint within the query execution plan that triggers re-optimization. A partially executed federated query execution plan is a federated query execution plan that is executed up to a checkpoint within the federated query execution plan that triggers re-optimization. Outdated or incomplete statistics, parameter markers, and complex skewed data frequently cause the selection of a sub-optimal query plan, which in turn results in bad query performance. Federated queries are regular relational queries accessing data on one or more remote relational or non-relational data sources, possibly combining them with tables stored in the federated DBMS server. A federated query execution plan is a query execution plan for a federated query. The execution of federated queries is typically divided between the federated server and the remote data sources. Outdated and incomplete statistics have a bigger impact on federated DBMS than on regular DBMS, as maintenance of federated statistics is unequally more complicated and expensive than the maintenance of the local statistics; consequently bad performance commonly occurs for federated queries due to the selection of a sub-optimal query plan.
Query refinement is the refining, or changing, of a query in order to improve upon the performance of the query. Current methods of query refinement are applied to the query compile phase and do not interfere with the query execution. All query compile time solutions are based on the idea of having perfect a-priori knowledge to compute a query plan. This knowledge may be obtained in several ways, such as, for example, statistics collection or sampling techniques. The solutions's goal is to improve query compilation through more accurate input parameters into the cost model. Current methods of query refinement are unable to overcome the problem of input data being incomplete or inaccurate. Current methods of query refinement are unable to recover from incorrect knowledge during query runtime.