This application relates to processing (including managing) precomputed views and generating candidate views in response to user queries on a database.
A database is a collection of data, usually pertaining to some reasonably well defined purpose. The data typically has a known format which is defined by metadata The data in a database is typically stored, retrieved, and modified by a database management system. A relational database management system stores information in tables, in rows and columns of data, and conducts searches. In a relational database, the rows of a table typically represent records (collections of information about separate items) and the columns typically represent fields (particular attributes of a record). A relational database management system may respond to user queries on the database by matching information from a field in one table with information in a corresponding field of another table, and producing a third table that combines data from both tables. For example, if one table contains the fields EMPLOYEE-ID, LAST-NAME, FIRST-NAME, DEPT-ID, SALARY and HIRE-DATE, and another table contains the fields DEPT-ID, DEPT-NAME, and LOCATION, a relational database management system may match the DEPT-ID fields in the two tables to find the names of all employees working in a department in a specified location.
Users typically query, update and manage a relational data base using a data sublanguage (e.g., SQL). A data sublanguage is one that may be used in association with another computer language for the specialized purpose of accessing data. There are many relational data sublanguages, including QUEL from Relational Technology, Inc. (ASK/Ingres), and RDML from Digital Equipment Corporation. SQL has been formally standardized for access to relational databases and is widely implemented and used, although there are many variations of SQL (e.g., RISQL(trademark) from Informix Software, Inc. of Menlo Park, Calif.).
In the relational database model, the basic unit of data is the relation. In SQL, the relation is represented by a table. A relation is typically made up of one or more attributes (represented as columns in SQL), each of which is associated with a data type (e.g., a character string, an integer, or a floating point number). Data typically is stored in a table in tuples (rows in SQL).
Referring to FIGS. 1A-1D, the relational database tables Product, Sales, Time and Result contain columns of attributes and rows of data related to those attributes. For example, the Product table of FIG. 1A, includes prod ID, product type, and bar code. Specific operations can be performed on these tables. One such operation is selection, which identifies a specific row or rows in a table. Selection is typically done by specifying one or more predicates that are used to filter a table to identify rows for which the predicate is true. Predicates are typically found in the WHERE clause of an SQL query. For example, a selection operation could request the selection of prod ID of 1, which would select the first row of the Product table. Another operation in the relational database model is called the join. A join operation is a way of combining data from two tables which is typically based on the relationships between the data in those tables. The Product table identifies the product type by prod ID, and the Sales table identifies the amount of dollars associated with each prod ID/time (time key) combination. The Product table and the Sales table may be joined through their prod ID columns. The Sales table also associates a time key with each row, and the Time table relates a day with each time key. Accordingly, the Sales table and the Time table may be joined through the time key values.
FIG. 2 shows relationships between a fact table and its dimension tables. Fact tables and dimension tables are a subset of detail tables. Fact tables are detail tables which record events (e.g., a sales event). The tables in which information related to the sales event (e.g., a time, a store, and a product) is stored are the dimension tables of the associated fact table. For example, Time table, Store table, and Product table are the dimension tables associated with the Sales table. The Class outboard (dimension of a dimension) table eliminates redundancies by repeated information (the relationships between products and classes) in a separate table. This feature is referred to as normalization.
Another concept in relational database models is functional dependency. A functional dependency is a many-to-one relationship between columns of values in database tables. A functional dependency from column x to column y is a constraint that requires two rows to have the same value for the y column if they have the same value for the x column. A functional dependency may be explicitly declared by a user, such as the database administrator.
Further, relational database models provide for an aggregation query, which is a query that requires the summarization or consolidation of rows in database tables, typically using a set function, such as SUM or COUNT, and an optional GROUP BY clause. An aggregate table is typically a table that summarizes or consolidates detail level records from other database tables.
SQL enables users to define a virtual table (a xe2x80x9cviewxe2x80x9d) and to save that definition in a database as metadata. A view usually is not physically materialized (or xe2x80x9cprecomputedxe2x80x9d) until it is needed (e.g., when a SQL statement references the view). The metadata about the view can include information such as the name of the view, the names and data types of each column and the SQL text describing the data the view produces. The metadata about the view is typically stored in the database""s metadata, but the actual data that the user will see in the view are typically not physically stored. Rather the data typically are stored in detail tables from which the view""s rows are derived. In the case of a precomputed view, data typically is stored in an associated precomputed table. In general, operations can be performed against views just as they can be performed against detail tables.
A user may request information such as how many units of cereal X were sold on Jan. 1, 1999. The result of that query may be derived from the Product table (FIG. 1A), the Sales table (FIG. 1B), and the Time table (FIG. 1C). However, deriving the answers to each question from various tables can be highly inefficient and time consuming. In order to increase efficiency, a database administrator may predict questions which are most likely to be asked and precompute a table which includes the answers to these likely questions prior to these questions actually being asked. For example, a store""s database administrator may determine that a frequently asked question is the total sum of sales of a given product over a period of one day (sum_dollars). Accordingly, the database administrator may create the Precomputed table (FIG. 1D) and a precomputed view associated with that table.
The database administrator typically can not be expected to anticipate all common questions likely to be asked and even if all queries were known it is not feasible to precompute/materialize all results. When a query is asked and the answer is not directly available from a precomputed table, the answer to such a query typically is derived from one or more detail tables, a process which may be highly time consuming and computationally intensive.
The invention features methods and apparatus for processing precomputed views for answering user queries on a database. In accordance with one aspect, a user query on the database is analyzed, and a candidate view based upon the user query is generated.
Embodiments may include one or more of the following features.
A log record may be generated for the candidate view. Log record generation may include a determination of whether the query was rewritten to use a precomputed view. Log record generation may include a determination of whether rewritten user queries include aggregation. Log record generation may include an identification of a candidate view that could be used to rewrite a user query.
The candidate view may be generated in SQL. The candidate view may be generated based upon meta data (including, e.g., primary key information, foreign key information relating two or more tables in the database, and user-specified hierarchies). The candidate view may be generated based upon functional dependencies in the database.
The candidate view may be generated based upon capabilities of the query processing system. For example, the generated candidate view may be usable by a query rewriting facility to rewrite the user database query used to generate the candidate view. The candidate view may be generated for a user database query that has been rewritten. The candidate view may be generated when additional aggregation is or is not present in the rewritten query. One or more additional candidate views may be generated based upon the database query.
The generated candidate view also may be characterized. Candidate view characterization may include generating a report that reflects current aggregation strategy effectiveness. Candidate view characterization may include generating a report that includes information about the generated candidate view. Candidate view characterization may include generating a measure of the benefit of the candidate view relative to other views for the database.
Among the advantages of the invention are the following.
The invention enables an advising facility to provide advice on views that do not currently exit but that could be precomputed and used by a rewriting facility to rewrite user queries, thereby improving query processing performance. The invention enables the advising facility to analyze only a subset of all possible precomputed views based upon user query history. A candidate view may be generated in SQL, allowing SQL to be used directly in creating an associated precomputed view. The candidate view also may be usable by a rewriting facility to rewrite the query from which the candidate view was generated (if the view was precomputed).
Other features and advantages will become apparent from the following description, including the drawings and the claims.