The present invention relates to database systems and more particularly to concurrency control for maintenance of materialized views.
Relational databases store information in collections of tables, in which each table is organized into rows and columns. A popular application of relational database technology is data warehousing, in which an organization""s data repository is designed to support the decision-making process for the organization. Many data warehouses are characterized by a schema having a very large table called a xe2x80x9cfact tablexe2x80x9d and many smaller lookup tables called xe2x80x9cdimension tables.xe2x80x9d A fact table contains information collected for analysis and dimension tables contain information about the attributes of the data in the large fact table.
FIG. 5 illustrates an exemplary database containing two tables, a product table 500 and a sales table 510, useful for recording and organizing information about a company""s sales operation. The columns of the product table 500 hold attributes for the different products sold by the company, including a product number xe2x80x9cPRODNOxe2x80x9d 502, and a product name xe2x80x9cPNAMExe2x80x9d 404. Information about each product is stored in a row. For example, the first row is an entry for product 11, which is peanuts. The sales table 410 holds information in columns for each sale of products made by the company. Such information may include, for example, the number of tons 412 of product in the sale, and the product number 414 of the product that was sold. In this example, the sales table 510 may be considered to be a fact table because the sales information is useful in analysis of profitability, and the product table 400 is an example of a dimension table because it stores information about the product number attribute 414.
A database user retrieves information from the tables of a relational database by entering input that is converted to queries by a database application, which submits the queries to a database server. In response to receiving a query, the database server accesses the tables specified in the query to determine which information within the tables satisfies the query. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately presented to the user. Database servers are also capable of combining or xe2x80x9caggregatingxe2x80x9d information contained in different tables in response to a query. For example, one query for the exemplary database is to list, for each sale, the tonnage sold from the sales table 410 and the corresponding name of the product sold from the product table 400. This query would be useful for users who don""t remember the meaning of the product numbers.
For any given database application, the queries must conform to the rules of a particular query language. Most query languages provide users with a variety of ways to specify information to be retrieved. For example, in the Structured Query Language (SQL), the following query requests the retrieval of a list, for each sale, of the tonnage sold and the name of the product sold:
SELECT TONS, PNAME
FROM PRODUCT, SALES
WHERE PRODUCT.PRODNO=SALES.PRODNO;
This query performs a join operation on the product table 500 and the sales table 510. A join operation combines rows from two or more relational database objects, such as tables, views, or snapshots. To process a join operation, the database server combines pairs of rows that satisfy the join conditions and the other predicates. A join is performed whenever multiple tables appear in the FROM clause of query. The SELECT list of the query can reference any of the columns from any of the base objects listed in the FROM clause. Most join queries contain a WHERE clause that contains a predicate that compares two columns, each from a different joined object. Such predicates are referred to join conditions.
For various reasons, it is desirable to define views that present results of queries. For example, views are often used to provide security by hiding sensitive information or simplify commands for a user by hiding the complexity of the database. A view is a logical table, and as logical tables, views can be queried just as if they were tables. The data that views actually present, however, is extracted or derived from other database objects, which may in fact be tables, other views, or snapshots. Columns and tables that are mapped to a view are referred to herein as base columns and base tables of the view, respectively.
A view is defined by metadata referred to as a view definition, which is typically in the form of a database query. For example, to create a view xe2x80x9cRESULTSxe2x80x9d, illustrated a view 520 in FIG. 5, for presenting the results of the query of STATEMENT 1, the following STATEMENT 2 may be issued to define the view:
CREATE VIEW RESULTS AS
SELECT TONS, PNAME
FROM PRODUCT, SALES
WHERE PRODUCT.PRODNO=SALES.PRODNO;
The data presented by conventional views is gathered and derived on-the-fly from the base tables in response to queries that access the views. This data is not persistently stored after the query accessing the view has been processed. Since the data provided by conventional views is gathered from the base tables at the time the views are accessed, the data from the views will reflect the current state of the base tables. The overhead, however, associated with gathering the data from the base tables for a view every time the view is accessed may be prohibitive, especially if the defining query of the view is expensive to compute.
A materialized view, on the other hand, is a view for which a copy of the view data is stored separately from the base tables of the materialized view. Since the view data is persistently stored, the view data is not required to be regenerated every time a query accesses the view, eliminating overhead associated gathering and deriving the view. In order to provide up-to-date view data, however, materialized views must be maintained to reflect the current state of the base tables. When the base tables of a materialized view are modified, corresponding changes are made to the materialized view. Using a materialized view can lead to a cost savings compared with use of a conventional view when the materialized view presents a set of data that is infrequently changed but frequently accessed, especially when the defining query is costly to calculate.
In general, there are two approaches to causing a materialized view to reflect changes made to its base tables. One approach, referred to as a total refresh, involves discarding the current materialized view data and reissuing the defining query to regenerate the entire materialized view based on the current state of the base tables. Total refresh, thus, incurs a significant performance penalty in regenerating the view, especially for large base tables, and reduces the relative performance benefit of using the materialized view.
The other approach for maintaining a materialized view is referred to herein as incremental maintenance. With incremental maintenance, the entire materialized view is not regenerated every time a base table is changed. Instead, the database server determines what changes, if any, must be made to the materialized view data to reflect the changes made to the base tables. Incremental maintenance significantly reduces the overhead associated with maintaining a materialized view when, for example, changes to the base table only require the insertion or deletion of a single row within the materialized view.
When performing a incremental maintenance on a materialized view that is defined by a join on a plurality of base tables, it is important to control the order of operations performed by concurrent transactions to avoid the xe2x80x9cmissing updates problem.xe2x80x9d A transaction is a logical unit of work that comprises one or more databese language statements, terminated by a xe2x80x9ccommitxe2x80x9d or xe2x80x9cabortxe2x80x9d operation. Concurrent transactions are received and processed by a database server within the same interval of time, for example, simultaneously by parallel execution on multiple processors or asynchronously by interleaved execution on a single processor.
Without proper concurrency control, it is possible for incremental maintenance to erroneously miss a modification to the base tables that should be reflected in the materialized view. Referring to FIG. 6, an example of the missing updates problem is illustrated with respect to a timeline of two transactions that are concurrently executed to modify empty base tables T1 and T2 of a materialized view T12, where T1 has two columns (a, b), T2 has two columns (b, c), and T12 has two columns (a, c) and is defined by: T1 JOIN T2 ON T1.b=T2.b.
At 604, the first transaction makes a modification to base table T1, for example, to insert values (1, 2) into base table T1. At 606, incremental maintenance is performed for the modification, but, since there is no data in the empty base table T2 that satisfies the join condition, no rows are selected for insertion into the materialized view T12.
Shortly thereafter, the second transaction modifies base table T2 to insert the values (2, 3) at 610 and performs incremental maintenance at 612. The data from the first transaction is not yet visible to the second transaction, because the first transaction has not yet committed its transaction. Therefore, since base table Ti still appears to be empty to the second transaction, the incremental maintenance for the second transaction does not select any rows to insert into the materialized view T12.
At 614, the first transaction commits to make the modification to base table T1 final and visible, and, at 618, the second transaction makes its modification to base table T2 final and visible. These modifications, according to the definition of the materialized view T12, should result in the values (1, 3) being inserted therein. The materialized view T12, however, remained empty and thus does not correctly reflect the changes to the base tables T1 and T2, because the incremental maintenance by either transaction could not capture the changes by the other transaction.
One known solution to the missing updates problem is to obtain an exclusive lock on all base tables T1 and T2 for performing the incremental maintenance. A lock is a data structure that indicates that a particular process has been granted certain rights with respect to a resource. There are many types of locks. An exclusive lock is a very restrictive lock that prevents other processes from modifying the resource or obtaining other locks on the resource. In some implementations, the exclusive also prevents read accesses to the resource.
FIG. 7 depicts a timeline of two transactions modifying base tables T1 and T2 of materialized view T12 as in FIG. 6, except with exclusive locking concurrency control. At 702, the first transaction requests an exclusive lock on each base table, T1 and T2, of the materialized view T12. This request succeeds immediately, because in this scenario, both base tables T1 and T2 are unlocked. At 704, the first transaction makes a modification to insert values (1, 2) into base table T1. At 706, incremental maintenance is performed for the modification, but, since there is no data in the empty base table T2 that satisfies the join condition, no rows are selected for insertion into the materialized view T12.
Shortly thereafter, the second transaction requests an exclusive lock on base tables T1 and T2 at 708. Since base tables T1 and T2 already have been granted an exclusive lock for the first transaction, the lock request blocks, i.e. suspends, until the pre-existing exclusive lock for base tables T1 and T1 is released. As a result, the intended modification by the second transaction is delayed until the first transaction commits its changes at 714 and releases its exclusive locks at 716. After the first transaction has released the exclusive locks on base tables T1 and T2 at 716, the lock request made by the second transaction back at 708 succeeds at point 718. Thereupon, the second transaction modifies table T2 to insert the values (2, 3) at 720 and performs incremental maintenance at 722. Since the data from the first transaction was committed previously at 714, that data is visible to the second transaction. Therefore, the incremental maintenance for the second transaction is able to select the proper values (1, 3) to insert into the materialized view T12.
The effect of obtaining the exclusive locks serializes transactions to the base tables T1 and T2 of the materialized view T12 and thereby solves the missing updates problem. More specifically, the exclusive locks obtained by the first transaction force the second transaction to wait until the first transaction commits the new data, allowing the second transaction to see the changes made by the first transaction. This serialization also occurs when two transactions make modifications to different records of the same base table. For example, as illustrated in FIG. 8, a first transaction obtains exclusive locks on both base tables T1 and T2 at 802, modifies base table T1 at 804, and performs incremental maintenance at 806. Shortly thereafter, the second transaction requests exclusive locks for base tables T1 and T2 but is blocked until the first transaction commits the changes at 814 and releases the exclusive locks at 816. At 818, the exclusive lock requests for the second transaction finally succeed, thereby allowing the second transaction to modify base table T1 at 820, incrementally maintain the materialized view T12 at 822, commit its changes at 824, and release its locks at 826. Serializing transactions to a table, however, is costly because different transactions are forced to wait for one another, preventing concurrent execution and increasing response time latency.
There exists a need for a correct and efficient concurrency control methodology for managing transactions that update base tables of a materialized view. Specifically, there is a need to solve the missing updates problem for concurrent transactions to different base tables of the materialized view while reducing the overall response time latency of processing transactions to the base tables.
The present invention stems from the realization that solving the missing updates problem involves serializing concurrent transactions to different base tables of the materialized view, but most of the concurrent data warehousing transactions are made to the same table, the fact table. Therefore, it is desirable to enable concurrent updates of the fact table to proceed without serialization while serializing transactions to different base tables of the materialized view.
These and other needs are addressed by the present invention by locking the base table being updated and the other base tables in the materialized view with different locks. Use of different locks allows concurrent transactions to the same table to proceed, thereby improving performance and latencies, but forces transactions to different tables to be serialized, thereby solving the missing updates problem. Serialization and its attendant performance disadvantages are therefore greatly reduced, since serialization is not performed for most of the data warehousing transactions, i.e. made to the fact table.
Accordingly, one aspect of the invention is a computer-implemented method and a computer-readable medium bearing instructions for maintaining a materialized view, that is defined by a join of a first table and a second table, during a transaction that performs an operation to modify the first table. A first-type lock is obtained on the first table and a second-type lock is obtained on the second table, in which the first-type lock and the second-type lock are different types of locks. After obtaining the first-type lock and the second-type lock, the operation to modify the first table is performed and the materialized view is maintained based on results of performing the operation to modify the first table.
The first-type lock and the second-type lock may be defined so that the first-type lock is obtainable only when the second-type lock is not currently granted and, likewise, the second-type is obtainable only when the first-type lock is not currently granted. For example, a request for a first-type lock would block when a second-type lock is currently granted, and vice versa. In one embodiment, the first-type lock is a row exclusive table lock and the second-type lock is a share table lock. In another embodiment, the first-type lock is a write lock on a subset, such as a row or page, of the first table and the second-type lock is a read lock on the first table.
Another aspect of the invention is a method of maintaining a materialized view, defined by a join of a plurality of based tables. The method includes serially executing transactions if the transactions modify different base tables and incrementally maintain the materialized view. If, on the other hand, the transactions modify the same base table and incrementally maintain the materialized view, those transactions are concurrently executed.
Still other objects and advantages of the present invention will become readily apparent from the following detailed description, simply by way of illustration of the best mode contemplated of carrying out the invention. As will be realized, the invention is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.