This invention relates generally to database query optimization, and more particularly to measuring the accuracy of query optimizers.
The accuracy of a query optimizer is intricately connected with the system performance of a database and its operational cost. One of the most performance-critical elements determining the accuracy of a cost-based optimizer is the accuracy of its cost model which determines how prone the optimizer is to misestimates, and, thus, to bad plan choices. And the optimizer is one of the most performance-sensitive components in a database as differences in query plans may result in several orders of magnitude of difference in query performance, significantly more than any other contributing factor. The more accurate the optimizer, the better and less costly the resulting query execution plans.
Database application programmers and other practitioners have long provided anecdotal evidence that database systems differ widely with respect to the quality of their optimizers. But, comparing query optimizers objectively is a difficult undertaking. Benchmarks, e.g., TPC-H, have been developed for assessing the query performance of database systems as a whole, end-to-end. However, no framework has been available to assess accurately the performance of the query optimizer in isolation or to permit objective comparison of the optimizers of different database systems.
There is no standard way to test an optimizer's accuracy. The cost units used in the cost model displayed with a plan do not reflect real time, but are used only for comparison of alternative plans pertaining to the same input query. Comparing these estimated cost values (times) with the actual execution times does not permit objective conclusions about the accuracy of the cost model. Moreover, the optimization results are highly system-specific and therefore defy the standard testing approach where results are compared to a reference or baseline to determine if the optimizer finds the “correct” solution. The optimal query plan for one system may differ significantly from that for another system because of implementation differences in the query executors and the query optimizers. These differences can lead to choosing radically different plans.
The ability to measure and compare objectively and independently the accuracy of optimizers across different database systems is highly desirable. Typically, systems with more accurate optimizers outperform other systems, and this effect is often magnified substantially by complex analytics queries. An optimizer's inaccuracy usually leads to heightened efforts to improve system performance which contributes significantly to the total cost of ownership of the system. Moreover, during system development or upgrade, the ability to measure optimizer accuracy can guide the development process and may prevent regressions.
There is a need for a framework for testing and quantifying the accuracy of a database query optimizer for a given workload, as well as for enabling objective comparison of the accuracy of different optimizers with respect to their plan choices. The invention is directed to these ends, and provides such a framework.