The present invention relates generally to a relational database system that has been extended to perform operations on a continuous stream of tuples, and particularly to a system and method for providing streaming access to rows in a partitioned database table.
Traditional relational database systems receive commands falling into two categories: data definition commands (DDL) and data manipulation commands (DML). Data manipulation commands can be categorized as either queries that read data in the database or update statements that insert, delete or update data in the database base. Traditional database systems are designed and optimized for those operations. These systems are not designed and optimized for allowing applications to register to receive notifications when new data becomes available or existing data is updated.
However, receiving such notifications is essential for many applications. The need for such services has caused most transaction service vendors to extend their systems by including separate, special purpose resource managers for queue management and publish/subscribe services. These resource managers allow applications to enqueue and dequeue requests for information. Publish/subscribe services allow applications to publish and subscribe to notifications.
In such systems, transactions access both an SQL database system (which may be considered to be a type of resource manager) as well as resource managers for queuing and/or publish/subscribe services. As a result, the SQL database system and the other resource managers have to participate together in an expensive two-phase commit protocol. Due to the lack of integration between the resource managers and the database system, the SQL compiler cannot optimize access to both the data stored by the queuing or publish/subscribe resource manager and the data stored in the database, and cannot perform joins across the different data sources.
To avoid subjecting applications to the two-phase commit protocol, some SQL database vendors (e.g., Oracle and Sybase) have integrated transactional queuing and publish/subscribe services into their database products. While their implementations remove the need for a two-phase commit protocol, these implementations use special purpose objects for queues and publication channels. These implementations prevent queues and publication channels from being accessed as part of SQL statements, unless the user is willing to forgo important characteristic properties of queue or notification channels, particularly suspending execution when all qualifying tuples have been returned and automatically resuming execution when new tuples become available. It also effectively prevents the SQL compiler in these systems from optimizing access to notifications and SQL data.
The present invention provides transactional queuing and publish/subscribe extensions to an SQL database infrastructure. These extensions do not introduce any special objects. Applications access regular SQL database tables, including tables used as queues or publication channels. Applications use SQL select statements to subscribe and/or dequeue notifications. Furthermore, applications use SQL insert and update statements to publish notifications. These extensions remove the need for a two-phase commit protocol, allow applications to perform join operations on data from different sources, and allow the SQL compiler to optimize access to both notifications and xe2x80x9cnormalxe2x80x9d SQL data, using previously developed query optimization techniques. Further, execution of an SQL statement that accesses a data stream is blocked when all qualifying tuples have been processed and then resumes once new notifications become available.
As indicated above, the present invention extends traditional relational database systems to perform operations on a continuous stream of tuples, while retaining the traditional benefits of such systems, such as set oriented access using relational operators, transactional protection, and so on. The stream of tuples may represent queue entries, or notifications.
A simple example will assist in understanding the difference between traditional and streaming queries. A traditional query for viewing or otherwise processing records relating to sales by a particular salesperson might be:
select * from SalesInfo where salesperson=xe2x80x9cJohnson, Maryxe2x80x9d.
This query would retrieve all records from the table named xe2x80x9cSalesInfoxe2x80x9d having a value of xe2x80x9cJohnson, Maryxe2x80x9d in the salesperson field (column). Only records in the table at the time the query is submitted will be reported by this query.
However, if the user wants the system to continuously monitor new sales entries for a particular salesperson, the user of a traditional relational database system cannot define a simple SQL statement to perform that function, since monitoring features are not an integrated part of any of the widely used xe2x80x9cdialectsxe2x80x9d of SQL. In contrast to that, the present invention allows users to issue an SQL select statement that performs this function:
select * from stream(SalesInfo) where salesperson=xe2x80x9cJohnson, Maryxe2x80x9d;
Using the present invention, when an SQL statement specifies stream access, the execution of the SQL statement never returns an xe2x80x9cend of dataxe2x80x9d indicator. Rather, when all qualifying tuples in the specified table have been processed, continued execution is blocked, and then resumes when more qualifying tuples become available.
In a database management system, when the table to be accessed by a statement is partitioned, a respective partition scan operator is used to access rows of each respective database table partition. A fan out operator receives requests from a calling application to access rows from table, and responds by sending requests for rows to the partition scan operators. The fan out operator subsequently receives rows of the table from the partition scan operators and sends the received rows of the database table to another node in the execution tree. Each of the partition scan operators responds to the requests for rows by returning to the fan out operator qualifying rows, if any, from the corresponding table partition. If the statement being executed includes an embedded delete or update operation, the qualifying rows are deleted or updated at the same time that the qualifying rows are returned to the fan out operator. The transaction associated with the delete or update operation is not allowed to commit until all rows deleted or updated by the partition scan operators have been sent to the calling application.
In another aspect of the present invention, the SQL compiler and SQL executor of a relational database system are extended to process operations on streams of tuples and to access regular database tables as continuous streams of tuples. In particular, a new table access method provides xe2x80x9cstream accessxe2x80x9d to a specified table. When using stream access, the SQL executor first reads all qualifying tuples in a specified table, and subsequently monitors for and returns new qualifying tuples being added to the table. The first part of the method is performed by a regular table scan, while the second part of the method is performed by a so-called delta scan. The monitoring function is performed until the cursor representing the SQL statement being executed, including the scan operations, is closed by the calling application.
The stream access mode of operation causes execution of an SQL statement to block, and thus be suspended, when there no qualifying tuples to be returned to the calling application. Execution of the SQL statement automatically resumes (actually, is rescheduled) when new data becomes available.
A set of data structures are provided to keep track of active table access operators associated with active statements (whose execution has not yet terminated). The data structures keep track of the status of each active table access (scan) operator. In particular, a session control block is expanded to include fields for keeping track of whether a scan is in the initial xe2x80x9cregularxe2x80x9d scan phase, or is in the delta scan phase of a scan operation. The session control block also includes a xe2x80x9cdelta scan listxe2x80x9d of new and modified rows to be processed in the delta scan phase. When any transaction terminates, and the terminated transaction updated the delta scan list, the delta scan operation is automatically rescheduled for execution to process the rows in the delta scan list that were added or modified by the terminated transaction.
Another feature of the present invention is a xe2x80x9cskip conflictxe2x80x9d access mode of operation. Normally, a table access operator is blocked when it attempts to access a row that is locked in a conflicting mode, such as a write lock mode, which is inconsistent with the read or write access required by the table access operator. Generally, the lock in the conflicting mode will be held by a transaction other than the transaction associated with the operator in question. When the SQL statement being executed uses the new xe2x80x9cskip conflictxe2x80x9d syntax provided by the present invention, the operator skips over rows that are locked in a conflicting mode, which would otherwise cause the operator to suspend operation. Furthermore, if the operator is operating in streaming mode, a key that identifies such skipped rows is added to the operator""s list of rows to be processed during the delta scan phase.
Another feature of the present invention is an execution engine for executing select statements that have an embedded delete or update operation. A select statement with an embedded delete operation is used by applications to dequeue data from a table. A dequeue statement in an application program is converted into a select statement (query) that uses the stream, skip conflict and embedded delete features of the present invention. A select statement with an embedded update operation enables an application to have the database management system modify rows in a table before the rows are returned to the application, as a single integrated operation.
When the statement being executed is a streaming mode query that accesses a partitioned table, the fan out operator sends non-blocking requests to the partition scan operators so that when one access to one partition is blocked, rows from another partition are processed.
Yet another feature of the present invention is an execution engine that, upon rollback of an aborted transaction, has the ability to set fields of the rows that are updated or deleted by the transaction prior to aborting. In particular, when a select, update or delete statement includes the xe2x80x9cset on rollbackxe2x80x9d clause, the respective table access operator of the execution engine modifies the xe2x80x9cbefore imagexe2x80x9d of each row that is stored by the transaction log manager. The modified before image includes the field updates specified by the set on rollback clause. If a transaction associated with the statement aborts, the restored before images are set to include the field updates specified by the set on rollback clause when the transaction log manager restores the before images of each row that was deleted or updated by the transaction.