The present invention relates to maintaining pre-computed aggregate views incrementally in the presence of non-minimal changes.
A view represents a query over tables in a database. A pre-computed view is a view that has the result of the view definition query materialized in a result table. A pre-computed aggregate view is a special case of a pre-computed view when the view definition contains aggregation. Aggregate query processing in large data warehouses can be computationally intensive. Pre-computation is an approach that can be used to speed up aggregate queries.
A pre-computed view (or more specifically, its associated result table) may become xe2x80x9cout-of-synchxe2x80x9d with the tables that it is derived from (often known as base or detail tables) when the data in those tables (i.e., the details data) are modified. That is, the information in the result table is no longer accurate because the data from which the information was derived, i.e., the detail data in the base tables, has been changed. Thus, it is the result table that must be modified in order to keep the pre-computed view xe2x80x9cin-synchxe2x80x9d with the detail tables. This process is known as view maintenance.
A result table, also known as a materialized table or a pre-computed table, will be referred to as a pre-computed table. A pre-computed view is generally known as a materialized view but will be referred to as a pre-computed view. When a pre-computed view is a pre-computed aggregate view, its associated pre-computed table, generally known as a pre-computed aggregate or aggregate, will be referred to as a pre-computed aggregate table.
A pre-computed aggregate view can be maintained either xe2x80x9cincrementallyxe2x80x9d using the changes to the detail data or by re-computing the view from the detail data. Incremental maintenance can be extremely fast for some types of pre-computed aggregate views and operations and more time-consuming for others.
Changes to the detail data can be a set of inserted rows, a set of deleted rows, a set of updated rows, or a combination of these. A set of updated rows can be treated as a set of deleted rows and a set of inserted rows. It is not necessary to xe2x80x9cconvertxe2x80x9d an update of a row in the base table into a delete followed by an insert. However, the effect of the update could be treated as a delete followed by an insert for the purposes of computing the incremental changes to the materialized table.
Changes to detail data, when represented as a set of deleted rows and a set of inserted rows, are weakly minimal if the set of deletes is a subset of the rows in the (pre-modified) version of the detail table. They are strongly minimal if the intersection of the set of deletes and inserts is empty and the conditions for weak minimality are satisfied. In order to guarantee strong or weak minimality, the changes must be preprocessed to produce minimal change sets.
Non-minimal changes are those that are not guaranteed to be minimal but can be. Note that a non-minimal change does not preclude strong or weak minimality.
Although the term xe2x80x9csetsxe2x80x9d is used here, the changes could be multisets and the discussion on minimality is not limited to pure relational sets. That is, the discussion also applies to bags, which are also referred to as multisets. Terms such as subset and intersection refer to bag subset and bag-based minimal intersection where multiplicities are taken into account.
There are various scenarios under which maintenance has to deal with changes that are not guaranteed to be strongly or weakly minimal. A transaction that makes incremental changes to a base table might insert a row and later update that row. The update may be represented as a delete of the newly inserted row and an insert. So, the set of xe2x80x9cdeletedxe2x80x9d rows and the set of xe2x80x9cinsertedxe2x80x9d rows are not a strongly minimal set of changes since the intersection of these sets is not empty. They are also not a weakly minimal set because the set of deleted rows is not in the original table. Another scenario which results in changes that are not guaranteed to be strongly or weakly minimal is deferred view maintenance. A view may be maintained, in a separate transaction, after a series of changes have been made to the detail tables by other transactions. This type of maintenance is called deferred view maintenance. Since a set of rows inserted by one transaction may be deleted by another, the set of changes may not be weakly minimal. It is possible to convert the sets into ones that are strongly minimal or weakly minimal by performing various operations that reduce the sets to minimal ones. Preprocessing is usually needed in conventional methods requiring change sets to be minimal.
The present invention provides methods and apparatus, including computer program products, for maintaining pre-computed aggregate views incrementally in the presence of non-minimal changes to base tables.
In general, in one aspect, the present invention provides a method for maintaining pre-computed aggregate views. The method includes receiving a pre-computed aggregate view derived from one or more base tables. The pre-computed aggregate view includes a pre-computed aggregate table and a view definition. The view definition includes aggregation functions that can be any combination of sum, sum distinct, count(*), count distinct, min, and max. The view definition may include expressions that may be nullable. The method includes receiving changes to the base table. The changes may be non-minimal. The method includes defining and applying a set of incremental modifications to the pre-computed aggregate table, wherein modifications may include any combination of inserts, deletes, and updates. The method includes defining a first table wherein each record, representing an aggregated group of changes, shows, for each aggregation function, the contributions of base table changes that are insertions for that group and the contributions of base table changes that are deletions for that group. Each record in the first table further includes the number of changes that are insertions and the number of changes that are deletions for the group represented by the record. The method includes defining the incremental modifications to the pre-computed aggregate table using some combination of information in the first table, the pre-computed aggregate table, and the one or more base tables from which the pre-computed view is derived. The method further includes identifying rows in the first table that do not contribute to the incremental modifications. The method also includes modifying the pre-computed aggregate table based only on information in the first table. The method also includes modifying the pre-computed aggregate table based only on information in the first table and the pre-computed aggregate table. The method includes analyzing the view definition including the type of aggregation functions and the nullability and data type of columns and expressions in the view definition to reduce or eliminate the use of information in base tables in order to define the incremental modifications to the pre-computed aggregate table.
In general, in another aspect, the present invention provides a method for maintaining pre-computed aggregate views. The method includes receiving a pre-computed aggregate view derived from one or more base tables. The pre-computed aggregate view includes a pre-computed aggregate table and a view definition. The view definition includes aggregation functions that can be any combination of sum, sum distinct, count(*), count distinct, min, and max. The view definition may include expressions that may be nullable. The method includes receiving changes to the base table. The changes may be non-minimal. The method includes defining a first table wherein each record shows, for each aggregation function, contributions from changes that are insertions and contributions from changes that are deletions. The method includes identifying, in the first table, each row representing an aggregated group that is not currently represented in the pre-computed aggregated table, that results from more changes that are insertions than changes that are deletions, and that does not require recomputation. The method includes modifying the pre-computed aggregate table with information from the identified rows.
In general, in another aspect, the present invention provides a method for maintaining pre-computed aggregate views. The method includes receiving a pre-computed aggregate view derived from one or more base tables. The pre-computed aggregate view includes a pre-computed aggregate table and a view definition. The pre-computed aggregate view is self-maintainable. The method includes receiving changes to the base table, the changes may be non-minimal. The method includes defining a first table wherein each record shows, for each aggregation function, contributions from changes that are insertions and contributions from changes that are deletions. The method includes identifying in the first table each row representing an aggregated group that is not currently represented in the pre-computed aggregate table and that results from more changes that are insertions than changes that are deletions. The method includes modifying the pre-computed aggregate table with information from the identified rows.
The invention can be implemented to realize one or more of the following advantages. A method in accordance with the invention allows a pre-computed aggregate view containing any combination of sum, count, min and max aggregation functions (including sum distinct, count(*) and count distinct) to be maintained incrementally even in the presence of non-minimal changes to a base table of the pre-computed aggregate view. The method correctly maintains pre-computed aggregate views involving nullable and/or non-nullable expressions in the view definition. A non-nullable column or expression is one that is guaranteed to not have null values. A nullable column or expression is one that may have null values. The method also handles multisets correctly. The method does not need preprocessing to reduce a set or sets of changes to a base table to become minimal. The changes are assumed to satisfy the property that the set of deleted rows is a subset of the combination of the set of original rows in the table and the inserted rows. In the method, changes computed and applied to the pre-computed aggregate table can be a set of inserted rows, a set of deleted rows, a set of updated rows, or a combination of these. This flexibility reduces the number of recomputations required to maintain a pre-computed aggregate view.
The details of one or more implementations of the invention are set forth in the accompanying drawings and the description below. Other features and advantages of the invention will become apparent from the description, the drawings, and the claims.