The present invention relates to the information management field, and more specifically, to improving processing and optimizing of database queries in a database management system. Databases come in many flavors. One popular form is a relational database management system (RDBMS), such as DB2™ system, which is manufactured by International Business Machines Corporation of Armonk, N.Y.
The RDBMS is responsible for handling all requests for access to the database where the data itself is actually stored, thereby shielding the users from the details of any specific hardware implementation. Using relational techniques, the RDBMS stores, manipulates and retrieves data in the form of table-like relations typically defined by a set of columns or attributes of data types and a set of rows (i.e. records or tuples) of data. The columns may further comprise restrictions on their data content (i.e. valid domains) and may be designated as a primary key or unique identifier for the relation or a foreign key for one or more other relations.
One very common language for dealing with RDBMSs is the Structured Query Language (SQL). SQL includes both data definition operations and data manipulation operations. To maintain data independence a query (i.e. a set of SQL commands) instructs the RDBMS what to do but not how to do it. Thus, the RDBMS includes a query processor for generating various query plans of execution and choosing the least expensive plan with respect to execution costs. Due to the high-level nature of relational expressions and a variety of implementation techniques, automatic query optimization is possible and often necessary to ensure more efficient query processing.
Often, a database application may require the creation of a “view” for the data in given relations or tables. A view provides an alternative way of looking at the data in one or more base tables. It may, for example, contain only selected columns from the table. A materialized query table (MQT) is a table whose definition is based upon the result of a query. The data that is contained in an MQT is derived from one or more tables on which the materialized query table definition is based. One way to think of an MQT is as a kind of materialized view. Both views and MQTs are defined on the basis of a query. The query on which a view is based is run whenever the view is referenced; however, an MQT actually stores the query results as data, and it is possible work with the data that is in the MQT instead of the data that is in the underlying tables.
Materialized query tables can significantly improve the performance of queries, especially complex queries. If the optimizer determines that a query or part of a query could be resolved using an MQT, the query might be rewritten to take advantage of the MQT. Thus, a current dilemma a database query optimizer is faced with regarding query performance is whether to use an existing MQT, or to fetch data directly from the buffer cache (if available) or any other cache/non cache object or perform direct I/O to fetch data from the disks. There are especially no easy solutions if the reporting requirements needs to fetch data from both RDBMS and exogenous unstructured data, such as RSS Feeds, flat files, etc.
Another option is to create a temporary MQT, which can satisfy the query under question. The trade off between the choices of using an existing MQT or temporary MQT may be understood as follows. The existing MQT has no start up time to use it, but using the existing MQT may result in long fetch times because extra I/O's on a table object must typically be performed to determine which records to discard and which to return to the user.
Conversely, the temporary MQT has a start up cost to build, but once built, the fetch time will be faster. The fetch time is faster for a temporary MQT because every record processed in the temporary MQT is one of interest, and thus no time is wasted going to the table object for records that will be discarded immediately. Such decisions need not be made solely based on information internal to the DBMS. External data, such as calendar events, unstructured data (Twitter, Facebook, blogs, etc.) can also be analyzed to predict the likelihood of queries occurring. In a simple example, if weather data indicates the likelihood of a hurricane striking a location, then building stores (e.g., Home Depot) might expect increases in the number of web queries relating to generators available at a particular store sold for less than a given price. The efficiency of answering such queries might be improved by a performance object, such as an MQT. Thus, there is a need for improved mechanisms for processing and optimizing database queries in a database management system.