In a database management system (DBMS) or other data storage and retrieval system, data is often stored as one or more database objects. Each database object may contain individual records, items, or sub-objects, and the data within each item may be further organized into one or more fields, elements, or attributes of the record. Each of the sub-objects may also qualify as a separate database object. In relational database systems, database objects often include tables or indexes that contain rows, and the rows are further organized into columns. In object oriented databases, database objects often include object classes that contain member objects that contain data stored as attributes or variables. Other database architectures may use other terminology, an the present disclosure is not limited to any particular type of database object.
Data containers or portions of data containers may be stored on disk in units that may be referred to as disk blocks or pages. Each disk block typically corresponds to a fixed amount of physical space on disk. The disk blocks may be arranged in extents. Each extent is made up of contiguous disk blocks, and the extent may be designated for storing a specific type of information. In one example, each extent is often reserved to store data from a single datafile. In this example, when writing new data to a datafile after the datafile already contains existing data, the new data may continue on disk at the end of the existing data. Reserving extents for data files may reduce or eliminate file fragmentation. In another example, a data storage system addresses storage units using extents rather than single data blocks.
Extents may further be arranged in segments. Each segment is a set of extents that have been allocated to a database object and which may be stored in the same tablespace. A segment may span multiple datafiles. A tablespace is a logical target for storing data, and a single tablespace may store multiple segments of data.
Data may also be partitioned or separated into multiple partitions. The partitions may be defined based on a partition key value, with different partition key values assigned to different partitions. For example, data corresponding to a “West” division may be assigned to the “West” partition, and data corresponding to an “East” division may be assigned to the “East” partition. Accordingly, as data is stored to a database object that has been partitioned into the “East” and “West” partitions, commands that retrieve or store data corresponding to the “East” division operate on the “East” partition, and commands that retrieve or store data corresponding to the “West” division operate on the “West” partition. Each partition may be stored in a separate segment or collection of segments.
Materialized Views
When a database management system contains very large amounts of data, certain queries against the database can take a long time to execute. The cost of executing a query may be greater when the query (which, for example, may take the form of a “SELECT” statement in the SQL database language) requires joins among a large number of database tables.
Among commercial users of database systems, it has become a common practice to store the results of often-repeated queries in database tables or some other persistent database object. By storing the results of queries, the costly join operations required to generate the results do not have to be performed every time the queries are issued. Rather, the database server responds to the queries by simply retrieving the pre-computed data.
These stored results are commonly referred to as materialized views. The content of a materialized view is defined by metadata referred to as a view definition. The view definition contains mappings to one or more columns in the one or more tables containing the data. Typically, the view definition is in the form of a database query.
Columns and tables that are mapped to a materialized view are referred to herein as base columns and base tables of the materialized view, respectively. The data maintained in the base columns is referred to herein as base data. The data contained in a materialized view is referred to herein as materialized data.
Materialized views eliminate the overhead associated with gathering and deriving the data every time a query is executed. Computer database systems that are used for data warehousing may maintain materialized views that contain pre-computed summary information in order to speed up query processing. Such summary information is created by applying an aggregate function, such as SUM, COUNT, or AVERAGE, to values contained in the base tables. Materialized views that contain pre-computed summary information are referred to herein as “summary tables” or more simply, “summaries”.
Summary tables typically store aggregated information, such as “sum of PRODUCT_SALES, by region, by month.” Other examples of aggregated information include counts of tally totals, minimum values, maximum values, and average calculations.
Through a process known as query rewrite, a query can be optimized to recognize and use existing materialized views that could answer the query. The data in materialized views needs to be identical to the data in the underlying base tables in order for the materialized to be used in place of their underlying base tables when executing rewritten queries.
Refreshing Materialized Views
As new data is periodically added to the base tables of a materialized view, the materialized view needs to be updated to reflect the new base data. A materialized view that does not require updating is considered to be “fresh”. Otherwise, the materialized view is considered to be “stale”. Materialized views are marked as stale in response to any change to the base data that is represented by the materialized view. Typically metadata stored in a database catalog is used to track the staleness of a dependent object such as a materialized view. The staleness is tracked at the object level which means the entire materialized view becomes stale after a change to the underlying base table is committed.
If a base table is partitioned, then the staleness of a materialized view of the base table may be tracked at the partition level. This type of staleness tracking is called Partition Change Tracking (PCT). Again, the database catalog is used to store PCT-related staleness, which identifies the change committed to a specific base table partition.
Stale materialized views are not safe to use because the data in the view may no longer be equivalent to the base data. A stale materialized view may be recomputed by various techniques that are collectively referred to as “refresh” operations.
Data loading and refresh of materialized views typically takes place during off-hours when the data warehouse is in a controlled period of little activity. The data loading and refresh is restricted to a time period called the refresh window during which the system can be dedicated to refresh. The refresh window is typically allowed to be no more than four to six hours. Refresh may be deferred until the end of the week or month, so that loading of additional detail data may occur much more frequently than refresh.
One approach to refreshing materialized views is referred to as the “total refresh” or “full refresh” approach. According to the total refresh approach, the values in materialized views are recalculated based on all of the base data every time new base data is supplied. Systems that employ full refresh approach have the disadvantage that the recreation process is a relatively lengthy operation due to the size and number of tables from which the materialized data is derived. For example, when ten new rows are added to a particular base table that contains a million rows, a total refresh operation would have to process all one million and ten rows of the base table to regenerate the materialized views derived using the base table.
Materialized views may also be incrementally refreshed, where, rather than generating a new set of materialized data based on calculations that use all of the base data, the materialized data is updated based on just the new base data. Incremental refresh may involve the re-computing of materialized data as data is being loaded into base tables. Thus, as one set of rows is added to a base table, the materialized view is updated based on the data in the new rows. As a second set of rows is added to a base table, the materialized view is again updated based on the data in the second set of rows. The incremental refresh is neither practical nor efficient for two reasons: first, the database user may have business reasons to not re-compute the materialized view each and every time data is loaded into base tables; and second, combining refresh with data loading degrades the performance of the data-loading procedure.
A deferred refresh technique marks materialized views as stale whenever the data in the base tables are changed. A refresh of the stale materialized views may be deferred until the end of the week or month, so that loading of additional data into the base tables of a materialized view may occur much more frequently than refresh of the materialized view.
Refreshing materialized views often requires a significant amount of time and computational power. If not performed efficiently, it is possible for the overhead associated with refreshing materialized views to outweigh any benefits derived from using the materialized views for processing queries.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.