The present invention is directed generally to the field of automated database systems. More particularly, the invention relates to a database management system providing support for database queries on multiple data sets. Still more particularly, the invention concerns a strategy for supporting data record updating and deleting based on the position of an iterator, such as a cursor, where the data records span multiple database tables. The database tables may be partitioned or non-partitioned.
Database management systems are well known. These systems are advantageously utilized to provide rapid access to large volumes of data that are stored and maintained in a digital processing system. A relational database management system (RDBMS) is a computer database management system that uses relational techniques for storing and retrieving data. Relational databases are computerized information storage and retrieval systems in which data in the form of tables (formally denominated "relations") are typically stored for use on disk drives or similar mass data stores. A table includes a set of rows (formally denominated "tuples" or "records") spanning several columns (formally denominated "attributes"). Each column in a table includes "restrictions" on the data contents thereof and may be designated as a primary or foreign key. Reference is made to C. J. Date, An Introduction to Database Systems, 6th edition, Addison-Wesley Publishing Co. Reading, Mass. (1995) for an comprehensive general treatment of the relational database art.
An RDBMS is structured to accept commands to store, retrieve, update and delete data using high-level query languages such as the Structured Query Language (SQL). The term "query" denominates a set of commands for retrieving data from a stored database. These commands may be entered interactively by a user, or may be executed as part of an application program. In the latter case, the SQL statements are physically "embedded" within the application program source code, so as to be intermixed with the statements of the host language. The SQL standard has been promulgated by the International Standards Association since 1986. Reference is made to the SQL-92 standard "Database Language SQL" published by the ANSI as ANSI X3.135-1992 and published by the ISO as ISO/IEC 9075:1992 for the official specification of the 1992 version of the Structured Query Language. Reference is also made to James R. Groff et al. (LAN Times Guide to SQL, Osborne McGraw-Hill, Berkeley, Calif. 1994) for a lucid treatment of SQL-92.
SQL queries can be designed to retrieve only a single data item from a database table or a set of multiple data items. In the latter operation, a mechanism is required for sequentially accessing the database table one row at a time. The mechanism that has evolved for this purpose is a cursor. A cursor is an iterator construct used to scan and process a set of data (i.e., records, keys, tuples, etc., satisfying certain conditions), one at a time. RDBMSs implement two types of cursors: user cursors and system cursors.
A user cursor directly corresponds to a cursor defined in a user application using an SQL DECLARE "x" CURSOR statement. This statement defines a cursor called x having an associated table expression for performing an operation on the data table. A looping FETCH statement is then used to retrieve data one row at a time from the resulting set and assign the data to specified variables. UPDATE and DELETE are separate dataset modification operations which can be performed as the cursor iterates from row to row, provided the cursor is defined as being updatable in the DECLARE statement.
System cursors are defined by the RDBMS and used internally to produce the results needed to satisfy the user's query. One or more system cursors might be used to produce the output corresponding to a single user cursor. An access path is used to implement a system cursor. The most common access paths are sequential scans of a table's data pages (a table scan cursor (TSC)) and range scans on a B+-tree index (an index scan cursor (ISC)).
Both system cursors and user cursors might operate over permanent data as well as temporary data (e.g., intermediate or final results stored in temporary tables or work files). Even if a user cursor requires accessing only a single table's data, multiple system cursors might have to be used, due to the exploitation of query processing techniques like index ANDing/ORing.
As indicated, in SQL, users can issue UPDATE or DELETE statements using an existing cursor position to identify the record to update or delete. These "positioned" UPDATE and DELETE statements are achieved through the CURRENT OF CURSOR semantics in SQL.
For example, consider the following SQL statements that (1) declare an updatable cursor, (2) open the cursor, (3) perform a FETCH operation and then (4) perform an UPDATE or DELETE operation in a row where the cursor is currently positioned:
______________________________________ declare X cursor for select [*] from [table/view] where [predicates] for update; open X; fetch X; update [Table/View] set [columns = new values] where current of X; or delete from [Table/View] where current of X; ______________________________________
Note that "current of X" identifies the current record on which the Cursor X is positioned. The UPDATE and DELETE SQL statements instruct the DBMS to perform an update or delete operation on that record, respectively. It should also be noted that the FETCH operation is separate from the UPDATE and DELETE operations. The FETCH operation controls the cursor by changing its position in the table. The UPDATE and DELETE operations are passive in the sense that they do not control the cursor. They must, however, obtain information about the cursor's position in order to perform their operations.
The challenge in CURRENT OF CURSOR UPDATES and DELETES is getting information to the UPDATE and DELETE operations about the position of the record that is currently associated with the cursor X in the database. In conventional DBMSs, a predetermined global data area (GDA) exists and stores information about the table on which a cursor is positioned. This information is used for processing the UPDATE/DELETE CURRENT of CURSOR commands. At SQL statement preparation time (i.e., compile time), the table being updated/deleted is identified and the memory location of its table descriptor is stored in the GDA as a table descriptor pointer. The table descriptor is a data structure that stores table state information as a table is accessed during a query. When a record is fetched from the table, the record ID information is recorded by the FETCH operation in the table descriptor. Upon executing a CURRENT of CURSOR UPDATE/DELETE SQL statement, the DBMS will consult the table descriptor pointer in the GDA to locate the table descriptor, and then obtain the correct table ID and record ID from the table descriptor to perform the required update/delete operation.
Performing CURRENT of CURSOR UPDATES/DELETES on a database becomes problematic when dealing with multiple tables. SQL permits users to defme a VIEW of a UNION of database tables. Creating a VIEW of a UNION can be useful in many situations. For example, it can be used when historical data stored in many tables is required for analysis.
A UNION of database tables presents special problems for CURRENT of CURSOR UPDATES and DELETES. In that case, the record that cursor X is positioned on may belong to any of the tables that are part of the UNION. The question is how to correctly identify the table where the current record originated.
Assume, for example, that there are two tables, T1(C1,C2) and T2(C1,C2), where C1 and C2 refer to columns in each table. Assume further that a VIEW representing a UNION of column 1 of T1 and T2 is declared, as follows:
______________________________________ create view V1(C1) as (select C1 from T1 union all select C1 from T2); ______________________________________
A cursor X can now be declared on the VIEW V1, and used to FETCH data and UPDATE or DELETE from V1 where CURRENT of CURSOR, as follows:
______________________________________ /* Open a cursor on the view */ declare X cursor for SELECT * FROM V1 WHERE C1 &gt; 0 for update; open X; fetch X; update from V1 set C1 = 1 where current of X; or delete from V1 where current of X; ______________________________________
In the foregoing example, the record X is positioned on may belong to Table T1 or Table T2 in the VIEW V1, depending on the manner in which the data sets in T1 and T2 satisfy the predicate criterion "C1&gt;0". Moreover, in an MPP (shared nothing) database system, each table may be partitioned over a group of processing nodes having unique node-IDs. So, not only the record ID changes with each fetch, but the table-ID and node-ID may also change. The existing method of statically determining the update table to record in the GDA is no longer adequate. A better method is desirable for determining the changing table-ID, node-ID and record-ID.
A similar problem exists for positioned update and delete operations on a grouping of data sets resulting from a JOIN operation:
For example, given table T1 and T2 as defined previously, and a view V1 defined as:
create view V1 (C1, C2) as (select a.C1, b.C1 from T1, T2); PA1 update from V1 set C1=10, C2=20 where current of X; PA1 or PA1 delete from V1 where current of X;
a cursor X can be declared on the view V1, and used to FETCH and UPDATE or DELETE from V1 as follows:
______________________________________ declare X cursor for select * from V1 for update; open X; fetch X; ______________________________________
In the foregoing example, the record X is positioned on belongs to both Table T1 and Table T2 in the VIEW V1. In order to perform the update and delete, the table-IDs and record-IDs of both TABLE T1 and TABLE T2 must be determined in an efficient manner similar to the UNION scenario.
Accordingly, a need exists for a database management system that supports positioned operations, such as UPDATE operations and DELETE operations, on a grouping of data sets, such as a UNION or a JOIN of database tables, or the like, in a database. What is required is a database management system that provides a method for determining the correct position, such as the table-ID, node-ID, and record-ID, associated with an iterator, such as an application cursor, and providing that information to an updatable operation such as an UPDATE operation or a DELETE operation, in an efficient manner with minimal processing overhead. It would be desirable to implement the foregoing enhancements in both a single node database system and a multi-node database system, such as a Shared-Nothing (SN) Parallelism Environment.