A database management system (DBMS) is a computer program that allows one or more computer users to create and access data in a database. A user accessing data in a database typically composes a request in the form of a “query”. The result of a query is often a “table”. The term “table” is used herein generically, and is not meant to distinguish the DBMS discussed herein from DBMSs that use similar terms, such as “file” or “relation”.
The DBMS manages user requests (and requests from programs) so that the users and the programs need not maintain information about where the data is physically located on storage media and, in a multi-user system, other users also accessing the data. In handling user requests, the DBMS ensures the integrity of the data (i.e., ensuring that the data continues to be accessible and is consistently organized as intended) and the security of the data (i.e., ensuring that only those with access privileges can access the data). The most typical DBMS is a relational database management system (RDBMS). A standard user and program interface is the Structured Query Language (SQL).
Constraints (both referential integrity and check constraints) are conditions defined for a given table that must be true for all rows of the table. The DBMS enforces a constraint by checking each row to determine whether the row meets conditions specified by a given constraint, rejecting any rows that do not meet the conditions. This checking may be deferred until later, after appending data to a given table using a “load” operation (use of which allows the addition of data to a table). When such constraint checking is deferred until later, the given table may be placed in “check pending” state. Once the checking is complete, violating rows may be removed from the given table and the given table may be placed in “normal” state.
A materialized query table (MQT) is a table defined as the result of a query, that result being automatically used to populate the MQT, which can then be used to efficiently provide a pre-computed result, or a partially pre-computed result, for certain other queries submitted to the DBMS. An MQT that references a given base table in the query that defines the MQT (the defining query of the MQT) is said to be a “dependent” of the given base table. An MQT could reference one or more such base tables and would be said to be a dependent of each of the base tables. Additionally, a given base table could be referenced in more than one MQT and each of those MQTs would be a dependent of that given base table. When new data is added to a base table that has dependent MQTs, each of the MQTs should, in general, be updated (refreshed) to reflect a new result for the corresponding defining query.
Dependent MQTs may be refreshed immediately or the refresh may be deferred until later. When an MQT is defined as REFRESH IMMEDIATE, whenever its content is available, it should be up to date. However, after a load operation into the base table, the dependant MQTs are placed in “check pending” state and become unavailable. When an MQT is refreshed, i.e., all new data in base tables for which the MQT is a dependent is reflected in the MQT, the MQT may be placed in “normal” state. When an MQT is defined as REFRESH DEFERRED, its content reflects the result of the MQT definition query when the MQT was last refreshed and hence its content may not always be up to date with respect to its base tables.
New data can be added to base tables periodically, often through the use of a load utility. In some cases, old data is removed from base tables when the old data is no longer needed or is obsolete. When old data is removed from base tables, gaps may be left in the physical space used to store the data. Where the base table is not a multi-dimensionally clustered (MDC) table, i.e., a base table of contiguous blocks of equal-size pages, the gaps left in the physical space used to store the data are not easily usable by the load utility, because the load utility is typically optimized for speed by being configured to append new data to the end of a base table.
Since the new data is added only at the end of the base table, the base table may be checked against constraints and dependant MQTs may be incrementally refreshed. Both operations, constraint checking the rows of the base table and refreshing the dependant MQTs, are considered to be straightforward in that the operations simply require a scan of the new data in the base table to identify the changes to the base table. Such a scan starts at the previously recorded “high water mark” of the base table and, thus, considers all of the new data and none of the old data.
An MDC table comprises contiguous blocks of equal-size pages. Each block of a given MDC table is the same size. The size of a block, in pages, is termed the block size. All rows stored in a given block have the same number of specified columns. This number of specified columns for the rows of a given block is termed the “dimension” value for the rows. The DBMS manages the data in the database to maintain this trait of each block. That is, new rows may only be added to existing blocks whose rows have the same dimension values as the new rows. Additionally, empty blocks may be reused for new rows or new blocks may be allocated at the end of the MDC table. The load utility program will not use existing populated blocks for new rows but will, instead, reuse empty blocks in the MDC table or extend the MDC table if no reusable blocks exist.
An MDC table is designed such that as old blocks of data are removed from the MDC table, the space the old data occupied is available for the later addition of new data. Consequently, simply scanning the MDC table from a certain point onward will lead to the consideration of old data as well as new data. This consideration of old data may lead to incorrect results when refreshing a dependant MQT (i.e., data may be counted twice) and inefficiency when performing constraint checking on the rows of the MDC table. Since the old data has already been checked and is known to be correct, it is inefficient to check the old data again.
Clearly, there is a need for an efficient method of performing incremental constraint checking on MDC tables that takes into account the nature of MDC tables wherein new data is interspersed with old data. Additionally, maintenance of MQTs dependant on MDC tables is not as straightforward as maintenance of MQTs dependant on non-MDC tables. Consequently, there is a need for a method of maintaining MQTs dependant on MDC tables that does not lead to incorrect results when refreshing the MQTs. The need for such a system has heretofore remained unsatisfied.