The present application is related to: U.S. patent application Ser. No. 09/221,649, entitled xe2x80x9cUsing a Materialized View to Process a Related Query Containing a One to Many Lossless Joinxe2x80x9d, filed by Andrew Witkowski, on the equal day herewith, herein referred to as Witkowski, the contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to optimizing queries, and in particular, to rewriting join queries to access data through a view or from a materialized view.
2. Background of the Invention
In a database management system (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms xe2x80x9ctablexe2x80x9d, xe2x80x9crowxe2x80x9d and xe2x80x9ccolumnxe2x80x9d shall be used herein to refer respectively to the data container, record, and field.
For various reasons, it is not desirable for certain users to have access to all of the columns of a table. For example, one column of an employee table may hold the salaries for the employees. Under these circumstances, it may be desirable to limit access to the salary column to management, and allow all employees to have access to the other columns. To address this situation, the employees may be restricted from directly accessing the table. Instead, they may be allowed to indirectly access the appropriate columns in the table through a xe2x80x9cviewxe2x80x9d.
A view is a logical table. As logical tables, views may be queried by users as if they were a table. However, views actually present data that is extracted or derived from existing tables. Thus, the problem described above may be solved by (1) creating a view that extracts data from all columns of the employee table except the salary column, and (2) allowing all employees to access the view.
The data presented by conventional views is gathered and derived on-the-fly from the base tables in response to queries that access the views. The data gathered for the view is not persistently stored after the query accessing the view has been processed. Because the data provided by conventional views is gathered from the base tables at the time the views are accessed, the data from the views will reflect the current state of the base tables. However, the overhead associated with gathering the data from the base tables for a view every time the view is accessed may be prohibitive.
A materialized view, on the other hand, is a view for which a copy of the view data is stored separate form the base tables from which the data was originally gathered and derived. The data contained in a materialized view is referred to herein as xe2x80x9cmaterialized dataxe2x80x9d. Materialized views eliminate the overhead associated with gathering and deriving the view data every time a query accesses the view.
However, to provide the proper data, materialized views must be maintained to reflect the current state of the base tables. When the base tables of a materialized view are modified, computer resources must be expended to both determine whether the modifications require corresponding changes to the materialized data, and to make the required corresponding changes. Despite the high cost associated with maintaining materialized views, using a materialized view can lead to significant overall cost savings relative to a conventional view when the materialized view represents a set of data that is infrequently changed but frequently accessed.
A view is defined by metadata referred to as a view definition. The view definition contains mappings to one or more columns in the one or more tables containing the data. Columns and tables that are mapped to a view are referred to herein as base columns and base tables of the view, respectively.
Typically, the view definition is in the form of a database query. These queries, like any database query, must conform to the rules of a particular query language such as the ANSI Structured Query Language (SQL). For example, the query:
SELECT T.a FROM T WHERE T.b=1
retrieves column T.a of table T for those rows whose value in T.b equals 1. The above query includes a SELECT clause (i.e. xe2x80x9cSELECT T.axe2x80x9d), a FROM clause (i.e. xe2x80x9cFROM Txe2x80x9d), and a WHERE clause (i.e. xe2x80x9cT.b=1xe2x80x9d). The FROM clause specifies one or more tables or views from which to retrieve values. The tables or views are referred to as the FROM list. The SELECT clause specifies one or more columns in the items in the FROM list from which to retrieve values. The one or more columns are referred to as the SELECT list. The WHERE clause specifies the rows from which the values are retrieved. Specifically, the WHERE clause contains one or more logical expressions defining criteria that must be meet by the rows from which values are retrieved.
Views are often based on joins of two or more tables. A join is an operation that combines rows from two or more tables and views that meet a join condition.
FIG. 1 shows tables that are used to illustrate a join. FIG. 1 shows tables X 190, Y 170, and Z 180. For example, materialized view YZ 110 represents the results of an xe2x80x9cequijoinxe2x80x9d between tables Y 170 and Z 180. An equijoin is a particular type of join where a row from a first table is combined with one or more rows from a second table, if the value in a specified column from the first table equals a value in a specified column from the second table. For example, combining the rows from table Y 170 and table Z 180 using the join condition y.a=z.a produces materialized view YZ 110. In materialized view YZ 110, row 114 was formed by combining row 172 (i.e. columns y.a and y.b in row 114) and 182 (i.e. columns z.a and z.b in row 114) because for rows 172 and 182, the value in y.a equals the value in z.a. A join is performed whenever a join is specified by a query.
As a matter of convenience, the value in a column of a row may be referred to by the label or name of the column. For example, the expression xe2x80x9cvalue in y.a equals the value in z.axe2x80x9d may be expressed as xe2x80x9cy.a equals z.axe2x80x9d, or xe2x80x9cy.a=z.axe2x80x9d.
A join may be specified by a query by including the tables to be joined in the FROM list and including a logical expression specifying the needed join conditions in the WHERE clause. For example, the previous equijoin of tables Y 170 and Z 180 may be specified by a query as follows:
SELECT * FROM Y, Z WHERE y.a=z.a
In addition to join conditions, the WHERE clause of a join query can also contain other conditions that refer to columns of only one table. These conditions can further restrict the rows returned by the join query.
Generating joins in response to receiving join queries can require a substantial amount of processing. To reduce the amount of such processing, a database manager transforms join queries to retrieve data from the xe2x80x9cpreprocessedxe2x80x9d materialized data residing in a materialized view. Currently, database systems that include query rewrite mechanisms rewrite some types of join queries but not other types of join queries.
An example of a query not transformed under the conventional approach is a query based on a semijoin. Let W greater than -p(w,t)T denote a semijoin between tables W and T on condition p(w,t). A semijoin results in another table that consists of rows in W that match the condition p(w,t). If the condition is not relevant, a shorter notation W greater than -T is used to represent the semijoin. The first table W is referred to as the xe2x80x9cleftxe2x80x9d table, and the second table T as the xe2x80x9crightxe2x80x9d table with respect to the semijoin. Note that the notation W greater than -T represents the same semijoin as the notation T- less than W. Typically, the condition of a semijoin requires that a column from the left table match the value of a column in the right table. Structures shown in FIG. 2 are used as an example to illustrate a semijoin.
FIG. 2 shows table W 270 and T 280 and semijoin table 250. Semijoin table 250 represents the results of a semijoin between table W 270 and table T 280 through the condition w.a equals t.a. Row 272 and row 274 are reflected in semijoin table 250 because each match the condition. With respect to row 274, although two rows in table T 280 match the condition w.a=t.a (i.e. row 284 and row 286), only one row in semijoin table 250 reflects row 274. No row in table T 280 meets the join condition with respect to row 276.
Materialized view WTU 220 is an example of a materialized view that contains data that may used to satisfy the above semijoin query. Materialized view WTU 220 is based on join W greater than  less than T greater than  less than U, where w.a=t.a and t.b=u.b. Materialized view WTU 220 contains row 272 and two copies of row 274, which represent data that may used to satisfy the above mentioned semijoin query.
Because using the xe2x80x9cpreprocessedxe2x80x9d materialized data of a materialized view frequently saves the computer resources required to process queries, it is desirable to provide a method of satisfying a query from the materialized data of a materialized view that contains data that may used to satisfy semijoin queries.
A method and apparatus for transforming queries is described. According to an aspect of the present invention, a select-project-join query Q that specifies at least one semijoin is rewritten into another query that references the materialized view M. A common section shared by the select-project-join query and the materialized view includes the tables being semijoined.