This application relates to server integrated systems and methods for processing (including managing) precomputed views for responding to user queries on a database.
A database is a collection of data, usually pertaining to some reasonably well defined purpose. The data typically has a known format which is defined by metadata The data in a database is typically stored, retrieved, and modified by a database management system. A relational database management system stores information in tables, in rows and columns of data, and conducts searches. In a relational database, the rows of a table typically represent records (collections of information about separate items) and the columns typically represent fields (particular attributes of a record). A relational database management system may respond to user queries on the database by matching information from a field in one table with information in a corresponding field of another table, and producing a third table that combines data from both tables. For example, if one table contains the fields EMPLOYEE-ID, LAST-NAME, FIRST-NAME, DEPT-ID, SALARY and HIRE-DATE, and another table contains the fields DEPT-ID, DEPT-NAME, and LOCATION, a relational database management system may match the DEPT-ID fields in the two tables to find the names of all employees working in a department in a specified location.
Users typically query, update and manage a relational data base using a data sublanguage (e.g., SQL). A data sublanguage is one that may be used in association with another computer language for the specialized purpose of accessing data. There are many relational data sublanguages, including QUEL from Relational Technology, Inc. (ASK/Ingres), and RDML from Digital Equipment Corporation. SQL has been formally standardized for access to relational databases and is widely implemented and used, although there are many variations of SQL (e.g., RISQL(trademark) from Informix Software, Inc. of Menlo Park, Calif.).
In the relational database model, the basic unit of data is the relation. In SQL, the relation is represented by a table. A relation is typically made up of one or more attributes (represented as columns in SQL), each of which is associated with a data type (e.g., a character string, an integer, or a floating point number). Data typically is stored in a table in tuples (rows in SQL).
Referring to FIGS. 1A-1D, the relational database tables Product, Sales, Time and Result contain columns of attributes and rows of data related to those attributes. For example, the Product table of FIG. 1A, includes prod ID, product type, and bar code. Specific operations can be performed on these tables. One such operation is selection, which identifies a specific row or rows in a table. Selection is typically done by specifying one or more predicates that are used to filter a table to identify rows for which the predicate is true. Predicates are typically found in the WHERE clause of an SQL query. For example, a selection operation could request the selection of prod ID of 1, which would select the first row of the Product table. Another operation in the relational database model is called the join. A join operation is a way of combining data from two tables which is typically based on the relationships between the data in those tables. The Product table identifies the product type by prod ID, and the Sales table identifies the amount of dollars associated with each prod ID/time (time key) combination. The Product table and the Sales table may be joined through their prod ID columns. The Sales table also associates a time key with each row, and the Time table relates a day with each time key. Accordingly, the Sales table and the Time table may be joined through the time key values.
FIG. 2 shows relationships between a fact table and its dimension tables. Fact tables and dimension tables are a subset of detail tables. Fact tables are detail tables which record events (e.g., a sales event). The tables in which information related to the sales event (e.g., a time, a store, and a product) is stored are the dimension tables of the associated fact table. For example, Time table, Store table, and Product table are the dimension tables associated with the Sales table. The Class outboard (dimension of a dimension) table eliminates redundancies by repeated information (the relationships between products and classes) in a separate table. This feature is referred to as normalization.
Another concept in relational database models is functional dependency. A functional dependency is a many-to-one relationship between columns of values in database tables. A functional dependency from column x to column y is a constraint that requires two rows to have the same value for the y column if they have the same value for the x column. A functional dependency may be explicitly declared by a user, such as the database administrator.
Further, relational database models provide for an aggregation query, which is a query that requires the summarization or consolidation of rows in database tables, typically using a set function, such as SUM or COUNT, and an optional GROUP BY clause. An aggregate table is typically a table that summarizes or consolidates detail level records from other database tables.
SQL enables users to define a virtual table (a xe2x80x9cviewxe2x80x9d) and to save that definition in a database as metadata. A view usually is not physically materialized (or xe2x80x9cprecomputedxe2x80x9d) until it is needed (e.g., when a SQL statement references the view). The metadata about the view can include information such as the name of the view, the names and data types of each column and the SQL text describing the data the view produces. The metadata about the view is typically stored in the database""s metadata, but the actual data that the user will see in the view are typically not physically stored. Rather the data typically are stored in detail tables from which the view""s rows are derived. In the case of a precomputed view, data typically is stored in an associated precomputed table. In general, operations can be performed against views just as they can be performed against detail tables.
A user may request information such as how many units of cereal X were sold on Jan. 1, 1999. The result of that query may be derived from the Product table (FIG. 1A), the Sales table (FIG. 1B), and the Time table (FIG. 1C). However, deriving the answers to each question from various tables can be highly inefficient and time consuming. In order to increase efficiency, a database administrator may predict questions which are most likely to be asked and precompute a table which includes the answers to these likely questions prior to these questions actually being asked. For example, a store""s database administrator may determine that a frequently asked question is the total sum of sales of a given product over a period of one day (sum_dollars). Accordingly, the database administrator may create the Precomputed table (FIG. 1D) and a precomputed view associated with that table.
The database administrator typically can not be expected to anticipate all common questions likely to be asked and even if all queries were known it is not feasible to precompute/materialize all results. When a query is asked and the answer is not directly available from a precomputed table, the answer to such a query typically is derived from one or more detail tables, a process which may be highly time consuming and computationally intensive.
The invention features server-integrated methods and apparatus for processing precomputed views for answering user queries on a database.
In accordance with one aspect, the invention features a system for answering user queries on a database stored on a server, comprising: a query processing system integrated into the server and configured to respond to user queries; and a query processing system integrated into the server and configured to process precomputed results of user queries on the database.
Embodiments may include one or more of the following features.
The query processing system may be configured to compute precomputed views. The query processing system may be configured to update precomputed views to reflect changes in the database. The query processing system may be configured to manage precomputed views. The query processing system may include a rewriter configured to rewrite user queries using precomputed views. The query processing system may be configured to provide information to users relating to precomputed views and user query history.
The query processing system may include an advisor configured to analyze precomputation strategy effectiveness. The advisor may be configured to generate a report reflecting current precomputation strategy effectiveness. The generated report may include a measure of the benefit of a precomputed view relative to other precomputed views. The generated report may include a measure of the benefit of a precomputed view relative to candidate views. The advisor may be configured to generate a report reflecting precomputed view utilization.
The query processing system may be configured to define an analysis space consisting of a subset of all possible views for the database based upon user query history. The query processing system may be configured to generate a candidate view. The query processing system is configured to utilize database meta data (e.g., functional dependencies).
In another aspect, the invention features a method of answering user queries on a database stored on a server, comprising: responding to user queries received at the server; and processing, at the server, precomputed results of user queries on the database.
Embodiments may include one or more of the following features.
Precomputed views may be computed at the server. Precomputed views may be updated at the server to reflect changes in the database. Precomputed views may be managed at the server. User queries may be rewritten at the server using precomputed views. Information relating to precomputed views and user query history may be provided to users from the server.
Precomputation strategy effectiveness may be analyzed at the server. A report reflecting current precomputation strategy effectiveness may be generated at the server. The generated report may include a measure of the benefit of a precomputed view relative to other precomputed views. The measure may reflect the benefit of a precomputed view relative to candidate views. A report reflecting precomputed view utilization may be generated at the server.
An analysis space consisting of a subset of all possible views for the database based upon user query history may be defined at the server. A candidate view may be generated at the server. Database meta data (e.g., functional dependencies) may be utilized at the server.
Among the advantages of the invention are the following.
Aggregation information is stored in the system tables along with all the other metadata for the database, making knowledge of all database activity centralized. The result of this integration is consistency and the ability to use the enhanced meta data for query compilation and optimization regardless of whether precomputed views are used. For example, if aggregate table data is out of synch with detail data, the system knows about it instantly and does not use the table to rewrite queries (unless requested to do so). In addition, optimization strategies are known to the query processing system as a result of full integration. Server integration enables precomputed tables to be automatically maintained. This feature also enables users to access advisory features and meta data information using an SQL interface. This features also allows the advisory facility to use the query processing capabilities located at the server, and allows front end tools to use the capabilities of the precomputed view processing system. In accordance with one aspect of the invention, precomputed data is stored in tables that allow the definition of relationships between precomputed results and detail data (e.g., foreign key relationships). Server integration also allows the query processing system to access the logical and physical database design options that are provided by the database server.
Other features and advantages will become apparent from the following description, including the drawings and the claims.