1. Field of the Invention
The present invention generally relates to data processing and more particularly to query optimization.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems. A relational database management system is a computer database management system (DBMS) that uses relational techniques for storing and retrieving data. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.
Regardless of the particular architecture, in a DBMS, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL). Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data, and so forth.
Generally, the DBMS includes a query optimizer component configured to determine the manner in which queries will be processed. The primary task of the optimizer is to determine the most efficient way to execute each particular query against a database. To this end, the optimizer typically determines an access plan for use in executing the query against the database. In general, the access plan contains low-level information indicating precisely what steps the system is to take to execute the query (e.g., using an index, a hash table, bit map, etc.). For any given query, there are a large number of possible access plans that may be chosen. Conventional optimizers are generally configured to determine the best access plan for each query it encounters, based on cost comparisons (i.e., estimated resource requirements, typically in terms of time and space) of available access plans. In selecting the access plan (and comparing associated costs), the optimizer may explore various ways to execute the query. For example, the optimizer may determine if an index may be used to speed a search, whether a search condition should be applied to a first table prior to joining the first table to a second table or whether to join the tables first.
Due to the large number of possible queries and the large number of possible access plans, running the optimizer itself may incur a substantial cost by placing a significant burden on system resources and/or requiring a significant amount of time. One approach to reduce this burden is to limit the number of times the optimizer is run by using the same access plan every time the same query is encountered. This approach is typically limited to situations where the same query is executed multiple times. In many cases, however, for a different, but similar query (e.g., queries with similar selection criteria or WHERE clauses), the optimizer may determine the same access plan. However, conventional optimizers are not designed to recognize similarities in queries and will, thus, wastefully optimize the similar query, although the same access plan will likely be returned.
Accordingly, there is a need for an improved method of optimizing query execution that overcomes the shortcomings of conventional query optimization techniques.