1. Field of the Invention
The present invention relates generally to information processing environments. More particularly, the present invention relates to capabilities that enhance substantially the operation, effectiveness, efficiency, etc. of query optimizers that are found in Database Management Systems (DBMSs).
2. Background Art
A common element of an information processing environment is a database, which is in effect a computer-based repository of information. Databases are extraordinarily prevalent and may be found on almost any computing platform including inter alia mainframe computers; computer servers; Personal Computers (PCs); handheld computers; pagers; Personal Digital Assistants (PDAs); cellular telephones, smart phones, and other wireless devices; radios; TVs; navigation systems; automobile audio systems; net appliances; etc.
A DBMS serves as a something of a ‘bridge’ between the information in a database (handling inter alia the organization of the information, the storage of the information on different devices, etc.) and users of the database. Among other things a DBMS provides database users with a logical or conceptual view of a database, allowing them to not concern themselves with inter alia the physical, implementation, etc. particulars of the database. When a user wishes to perform some action on the database (e.g., to retrieve a piece of information from the database, to update a piece of information in the database, to add a new piece of information to the database, etc.) the user will typically submit a query to the DBMS.
A database may be organized according to different models such as hierarchical, network, and relational.
Under a relational model a database may comprise inter alia one or more tables (relations), each table comprising one or more rows or records (tuples), each row/record comprising one or more columns or fields (attributes), with each column/field comprising some piece of information. As an example, a database comprising information on an organization's employees might contain a table EMPLOYEES that houses one record for each employee. Each record in the EMPLOYEES table might contain fields that preserve specifics about the employee such as inter alia the employee's name (e.g., a field named EMP_NAME), home address (e.g., a field named EMP_ADDRESS), current position, salary, work telephone number, etc.
Under a relational model a ‘bridge’ DBMS takes the form of a Relational DBMS (RDBMS) and a query to a RDBMS typically takes the form of a Structured Query Language (SQL) statement.
A SQL statement (such as for example ‘SELECT EMP_NAME, EMP_ADDRESS FROM EMPLOYEES’) expresses a desired result (in the instant example, ‘please return to me the name and the address of each employee’) but does not inter alia identify how those results should be obtained. In other words, the query itself does not specify how the query should be evaluated by an RDBMS. A component of an RDBMS, a query optimizer or optimizer, is responsible for inter alia (1) identifying the different valid ways in which (plans for how) the data within the database may be accessed so as to achieve the result that is requested by a SQL statement, (2) evaluating and costing the identified plans, and (3) selecting the ‘best’ (e.g., the cheapest, the fastest, etc.) plan.
As it completes its work a query optimizer may identify and evaluate a number of items, artifacts, criteria, etc. including inter alia join operations.
Classically a RDBMS supports dyadic join operations, that is join operations that involve just two entities such as tables (e.g., the join operation T1  T2 involving the two tables T1 and T2). Consequently for a query that requires an n-way join (i.e., a join operation that involves n tables where n>2) a query optimizer must inter alia enumerate or identify (possibly just some subset of) the universe of possible join combinations (i.e., a search space); evaluate, based on various criteria including for example cost, some or all of the candidates in the search space; and then string together one specific sequence of individual two-way join operations to arrive at the ‘best’ (e.g., perhaps the cheapest) way of realizing the required n-way join. For example, for a four-way join involving four tables (T1, T2, T3 and T4) a query optimizer might arrive at the specific join sequence ((T1  T2)  (T3  T4)).
Conventional query optimization techniques often give rise to various disadvantages. For example:
1) As the number of entities (e.g., tables) in an n-way join increases the size of the resulting search space, that is the universe of possible join combinations, grows very quickly resulting in inter alia longer and longer amounts of time to iterate through the elements of the search space (to for example access, review, cost, etc. those elements).
2) For many dynamically generated queries the execution time of the query itself may be quite small but the optimization time may be quite large and thus disproportionate to the execution time.
From all of the different plans that an optimizer may have to chose from, if it selects a ‘good’ plan then processing of the query will be completed ‘quickly’ (with possibly inter alia lower system resource consumption, etc.). Alternatively, if it selects a ‘bad’ plan then processing of the query will be completed ‘slowly’ (with possibly inter alia higher system resource consumption, etc.).
Given the performance, system resource consumption, etc. ramifications and implications of the query optimization process it is obviously very important for an optimizer to identify and select the ‘best’ available query execution plan. That objective—identifying and selecting the ‘best’ available query execution plan—is made challenging by the host of constraints that an optimizer must operate under including inter alia available system resources (such as memory), specific query characteristics, parameters such as the maximum amount of time that an optimizer may spend on any particular activity, the status of the RDBMS, etc.
Aspects of the present invention address the challenge that was noted above (1) by (a) improving upon the way in which a search space is generated and managed and (b) improving on the way in which the elements of a search space are evaluated so that among other things unpromising elements are efficiently dropped (pruned) (2) while addressing, in new and innovatory ways, various of the not insubstantial challenges that are associated with same.