The invention relates to database management systems, and in particular, to database query optimizers utilized in such systems.
Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS""s), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query xe2x80x9coptimizationxe2x80x9d, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query. In addition, a reduced reliance on runtime interpretation of queries in favor of increased usage of directly-executable program code has improved query engine performance.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
One area that has been a fertile area for academic and corporate research is that of improving the designs of the xe2x80x9cquery optimizersxe2x80x9d utilized in many conventional database management systems. The primary task of a query optimizer is to choose the most efficient way to execute each database query, or request, passed to the database management system by a user. The output of an optimization process is typically referred to as an xe2x80x9cexecution plan,xe2x80x9d xe2x80x9caccess plan,xe2x80x9d or just xe2x80x9cplan.xe2x80x9d Such a plan typically incorporates (often in a proprietary form unique to each optimizer/DBMS) low-level information telling the database engine that ultimately handles a query precisely what steps to take (and in what order) to execute the query. Also typically associated with each generated plan is an optimizer""s estimate of how long it will take to run the query using that plan. Some optimizer designs are referred to as xe2x80x9crule-basedxe2x80x9d optimizers. Rule-based optimizers implement optimization algorithms in terms of rules that define specific operations that need to occur when certain conditions are met. Such rules are defined statically and are hard coded into the optimizers.
An optimizer""s job is often necessary and difficult because of the enormous number (i.e., xe2x80x9ccountably infinitexe2x80x9d number) of possible query forms that can be generated in a database management system, e.g., due to factors such as the use of SQL queries with any number of relational tables made up of countless data columns of various types, the theoretically infinite number of methods of accessing the actual data records from each table referenced (e.g., using an index, a hash table, etc.), the possible combinations of those methods of access among all the tables referenced, etc. An optimizer is often permitted to rewrite a query (or portion of it) into any equivalent form, and since for any given query there are typically many equivalent forms, an optimizer has a countably infinite universe of extremely diverse possible solutions (plans) to consider. On the other hand, an optimizer is often required to use minimal system resources given the desirability for high throughput. As such, an optimizer often has only a limited amount of time to pare the search space of possible execution plans down to an optimal plan for a particular query.
Since interesting queries are generally long, complex, and involve many relational tables, optimizers typically break many queries into steps. Conventional optimizers typically generate alternative plans for each step, compute their respective estimated costs, and then incrementally combine the estimated costs for the various steps for comparison against other competing plans or portions thereof. Because of the large search space mentioned above, early elimination of entire quadrants of the search space is often necessary. In addition, the generation of alternative plans, and the subsequent discarding of such plans as a result of comparisons made with other plans, must be performed quickly and without disrupting the original state of a query.
In this regard, one way to view a query optimizer is that of a query execution xe2x80x9csimulatorxe2x80x9d, since in order to do a cost estimate of an access plan for any part of a query (or the entire query), an optimizer must simulate the environmental conditions under which the query will eventually be executed.
The optimization algorithms utilized by conventional query optimizers are continually improved, and new query optimization algorithms are devised every day. However, optimization algorithms and techniques do not operate in a vacuum, and the design of an optimizer often must take into account the various performance characteristics of various algorithms and techniques utilized therein. Conventional optimizers are typically developed with a given set of techniques and optimizations in mind, and as a result, it is very difficult to modify or add mechanisms to an optimizer design to improve the performance of the design vis a vis one particular query form, without inadvertently introducing regression in the performance of other query forms.
Put another way, the implementation of conventional optimizers is often too closely coupled with the assumptions of the various algorithms they use, i.e., the knowledge available to an optimizer, and the intelligence embodied in an optimizer that utilizes such knowledge, is often unavoidably intertwined in the logic of the optimizer. As an example, the decision making implemented within conventional optimizers is often very specific, and is xe2x80x9chard codedxe2x80x9d to try various optimization algorithms in specific orders. Even in rule-based optimizers, rules are inflexibly defined and intertwined with the hard logic of the optimizer. As a result, in many instances particular optimization algorithms are inherently favored over other algorithms.
Given the continual development and refinement of optimization algorithms and techniques, modifying the operation of a particular optimizer design to incorporate new or refined optimization functionality is often desirable. However, due to the static and inflexible manner in which many optimizer designs are coded, such modifications are often difficult, costly and time consuming. A need therefore exists for a more flexible and extensible optimizer architecture that is better suited to incorporating new or improved optimization algorithms and techniques.
The invention addresses these and other problems associated with the prior art by providing an apparatus, program product and method in which a query optimizer utilizes a plurality of dynamically-dispatchable active strategy entities to generate an optimized execution plan for a database query. The plurality of active strategy entities form a optimizer framework from which the knowledge available to the optimizer, and the intelligence utilized by the optimizer in generating execution plans, may be represented in a structured, flexible, extensible and easily-modifiable manner.
Consistent with the invention, a plurality of active strategy entities including at least (but not limited to) first, second and third active strategy entities are utilized by a query optimizer to generate an optimized execution plan for a database query. The first and second active strategy entities may each be configured to generate at least a portion of an execution plan. The third active strategy entity may be configured to dynamically evaluate at least one condition associated therewith, and in response thereto, to dynamically dispatch at least one of the first and second active strategy entities. Generation of the optimized execution plan for a database query may be initiated by dynamically dispatching the third active strategy entity.
These and other advantages and features, which characterize the invention, are set forth in the claims annexed hereto and forming a further part hereof. However, for a better understanding of the invention, and of the advantages and objectives attained through its use, reference should be made to the Drawings, and to the accompanying descriptive matter, in which there is described exemplary embodiments of the invention.