The present invention relates to maintaining pre-computed aggregate views incrementally in the presence of non-minimal changes.
A view represents a query over tables in a database. A pre-computed view is a view that has the result of the view definition query materialized in a result table. A pre-computed aggregate view is a special case of a pre-computed view when the view definition contains aggregation. Aggregate query processing in large data warehouses can be computationally intensive. Pre-computation is an approach that can be used to speed up aggregate queries.
A pre-computed view (or more specifically, its associated result table) may become “out-of-synch” with the tables that it is derived from (often known as base or detail tables) when the data in those tables (i.e., the details data) are modified. That is, the information in the result table is no longer accurate because the data from which the information was derived, i.e., the detail data in the base tables, has been changed. Thus, it is the result table that must be modified in order to keep the pre-computed view “in-synch” with the detail tables. This process is known as view maintenance.
A result table, also known as a materialized table or a pre-computed table, will be referred to as a pre-computed table. A pre-computed view is generally known as a materialized view but will be referred to as a pre-computed view. When a pre-computed view is a pre-computed aggregate view, its associated pre-computed table, generally known as a pre-computed aggregate or aggregate, will be referred to as a pre-computed aggregate table.
A pre-computed aggregate view can be maintained either “incrementally” using the changes to the detail data or by re-computing the view from the detail data. Incremental maintenance can be extremely fast for some types of pre-computed aggregate views and operations and more time-consuming for others.
Changes to the detail data can be a set of inserted rows, a set of deleted rows, a set of updated rows, or a combination of these. A set of updated rows can be treated as a set of deleted rows and a set of inserted rows. It is not necessary to “convert” an update of a row in the base table into a delete followed by an insert. However, the effect of the update could be treated as a delete followed by an insert for the purposes of computing the incremental changes to the materialized table.
Changes to detail data, when represented as a set of deleted rows and a set of inserted rows, are weakly minimal if the set of deletes is a subset of the rows in the (pre-modified) version of the detail table. They are strongly minimal if the intersection of the set of deletes and inserts is empty and the conditions for weak minimality are satisfied. In order to guarantee strong or weak minimality, the changes must be preprocessed to produce minimal change sets.
Non-minimal changes are those that are not guaranteed to be minimal but can be. Note that a non-minimal change does not preclude strong or weak minimality.
Although the term “sets” is used here, the changes could be multisets and the discussion on minimality is not limited to pure relational sets. That is, the discussion also applies to bags, which are also referred to as multisets. Terms such as subset and intersection refer to bag subset and bag-based minimal intersection where multiplicities are taken into account.
There are various scenarios under which maintenance has to deal with changes that are not guaranteed to be strongly or weakly minimal. A transaction that makes incremental changes to a base table might insert a row and later update that row. The update may be represented as a delete of the newly inserted row and an insert. So, the set of “deleted” rows and the set of “inserted” rows are not a strongly minimal set of changes since the intersection of these sets is not empty. They are also not a weakly minimal set because the set of deleted rows is not in the original table. Another scenario which results in changes that are not guaranteed to be strongly or weakly minimal is deferred view maintenance. A view may be maintained, in a separate transaction, after a series of changes have been made to the detail tables by other transactions. This type of maintenance is called deferred view maintenance. Since a set of rows inserted by one transaction may be deleted by another, the set of changes may not be weakly minimal. It is possible to convert the sets into ones that are strongly minimal or weakly minimal by performing various operations that reduce the sets to minimal ones. Preprocessing is usually needed in conventional methods requiring change sets to be minimal.