This application relates to processing and maintaining precomputed views in a database.
A database is a collection of data, usually pertaining to a well defined purpose. The data typically has a known format which is defined by metadata. The data in a database is typically stored, retrieved, and modified by a database management system. A relational database management system (RDBMS) stores information in tables, in rows and columns of data, and conducts searches. The rows of an RDBMS table typically represent records (collections of information about separate items) and the columns typically represent fields (particular attributes of a record). A relational database management system may respond to user queries on the database by matching information from a field in one table with information in a corresponding field of another table, and producing a third table that combines data from both tables. Users typically query, update and manage the RDBMS using a data sublanguage (e.g., SQL). A data sublanguage is one that may be used in association with another computer language for the specialized purpose of accessing data. There are many relational data sublanguages, including QUEL from Relational Technology, Inc. (ASK/Ingres), and RDML from Digital Equipment Corporation. SQL has been formally standardized for access to relational databases and is widely implemented and used, although there are many variations of SQL (e.g., RISQLxc3x96 from Informix Software, Inc. of Menlo Park, Calif.).
In the relational database model, a relation can be represented by a table. A relation is typically made up of one or more attributes (represented as columns in SQL), each of which is associated with a data type (e.g., a character string, an integer, or a floating point number). Data typically is stored in a table in tuples (rows in SQL).
Referring to FIGS. 1A-1D, the relational database tables Product, Sales, Time and Result contain columns of attributes and rows of data related to those attributes. For example, the Product table of FIG. 1A, includes prod ID, product type, and bar code. Specific operations can be performed on these tables. One such operation is selection, which identifies a specific row or rows in a table. Selection is typically done by specifying one or more predicates that are used to filter a table to identify rows for which the predicate is true. Predicates are typically found in the WHERE clause of an SQL query. For example, a selection operation could request the selection of prod ID of 1, which would select the first row of the Product table. Another operation in the relational database model is called the join. A join operation is a way of combining data from two tables which is typically based on the relationships between the data in those tables. The Product table identifies the product type by prod ID, and the Sales table identifies the amount of dollars associated with each prod ID/time (time key) combination. The Product table and the Sales table may be joined through their prod ID columns. The Sales table also associates a time key with each row, and the Time table relates a day with each time key. Accordingly, the Sales table and the Time table may be joined through the time key values.
FIG. 2 shows relationships between a fact table and its dimension tables. Fact tables and dimension tables are a subset of detail tables. Fact tables are detail tables which record events (e.g., a sales event). The tables in which information related to the sales event (e.g., a time, a store, and a product) is stored are the dimension tables of the associated fact table. For example, Time table, Store table, and Product table are the dimension tables associated with the Sales table. The Class outboard (dimension of a dimension) table eliminates redundancies. This feature is referred to as normalization.
Another concept in relational database models is functional dependency. A functional dependency is a many-to-one relationship between columns of values in database tables. A functional dependency from column x to column y is a constraint that requires two rows to have the same value for the y column if they have the same value for the x column. A functional dependency may be explicitly declared by a user, such as the database administrator.
Further, relational database models provide for an aggregation query, which is a query that requires the summarization or consolidation of rows in database tables, typically using a set function, such as SUM or COUNT, and an optional GROUP BY clause. An aggregate table is typically a L table that summarizes or consolidates detail level records from other database tables.
SQL enables users to define a virtual table (a xe2x80x9cviewxe2x80x9d) and to save that definition in a database as metadata. A view usually is not physically materialized (or xe2x80x9cprecomputedxe2x80x9d) until it is needed (e.g., when a SQL statement references the view). The metadata about the view can include information such as the name of the view, the names and data types of each column and the SQL text describing the data the view produces. The metadata about the view is typically stored in the database""s metadata, but the actual data that the user will see in the view are typically not physically stored. Rather the data typically are stored in detail tables from which the view""s rows are derived. In the case of a precomputed view, data typically is stored in an associated precomputed table. In general, operations can be performed against views just as they can be performed against detail tables.
A user may request information such as how many units of cereal X were sold on Jan. 1, 1999. The result of that query may be derived from the Product table (FIG. 1A), the Sales table (FIG. 1B), and the Time table (FIG. 1C). However, deriving the answers to each question from various tables can be highly inefficient and time consuming. In order to increase efficiency, a database administrator may predict questions which are most likely to be asked and precompute a table which includes the answers to these likely questions prior to these questions actually being asked. For example, a store""s database administrator may determine that a frequently asked question is the total sum of sales of a given product over a period of one day (sum_dollars). Accordingly, the database administrator may create one or more precomputed tables (FIG. 1D) and a precomputed view associated with that table.
These precomputed tables need to be maintained to ensure that the precomputed tables store current data. When changes are made to detail tables, the precomputed view can become xe2x80x9cout-of-syncxe2x80x9d with the detail data. The maintenance of a large number of precomputed tables is time consuming and requires an administrator to invest time updating the precomputed view. When a query is asked and the answer is not directly available from a stale precomputed table, the answer to such a query typically is to execute the query from one or more detail tables, a process which may be highly time consuming and computationally intensive.
The invention provides automatic maintenance of precomputed views. The invention maintains a precomputed view corresponding to detail data in a database server by: determining a maintenance plan to refresh the precomputed view when detail data is changed; integrating the maintenance plan with a query execution plan in the database server; and executing the query execution plan.
The precomputed view can be maintained either incrementally using the changes to detail data or by recomputing the view directly from detail data or other precomputed views.
Advantages of the invention include one or more of the following. The maintenance process is performed in a transparent, comprehensive, efficient and robust manner. Views are maintained automatically within the same transaction that changes detail data. Users are completely insulated from the maintenance process and do not have to write special scripts or schedule jobs for maintaining views.
Further, the maintenance process is comprehensive: All supported precomputed views are automatically maintained when detail data changes. The maintenance is supported for both server-based and loader-based inserts, deletes, and updates and with any interface that changes data. Options are available for turning off automatic maintenance for all and individual views. Further, users can use a rebuild facility to rebuild the views from detail data at a later point in time.
The view maintenance system is efficiently integrated with a warehouse server and allows increases the scope for optimizing the maintenance strategies. The invention also leverages many existing loading and efficient query processing techniques in the system. The maintenance system uses the capabilities of a rewriter in determining dependency information among views. This dependency information is used in determining the correct maintenance strategy including sequencing and parallelizing maintenance of multiple views. The system also uses foreign-key/primary-key relationships, hierarchy information, nullability of columns and other metadata to produce more efficient maintenance plans.
Yet another advantage is the system""s robustness. The view maintenance system is designed to maintain data integrity. For example, any detail data modifications that fail referential integrity or uniqueness constraints will not be propagated to precomputed views. The system can mark any views for which maintenance did not succeed as invalid, thus ensuring correctness of query results.
Other features and advantages will become apparent from the following description, including the drawings and the claims.