Traditional relational database systems receive commands falling into two categories: data definition commands (DDL) and data manipulation commands (DML). Data manipulation commands can be categorized as either queries that read data in the database or update statements that insert, delete or update data in the database base. Traditional database systems are designed and optimized for those operations. These systems are not designed and optimized for allowing applications to register to receive notifications when new data becomes available or existing data is updated.
However, receiving such notifications is essential for many applications. The need for such services has caused most transaction service vendors to extend their systems by including separate, special purpose resource managers for queue management and publish/subscribe services. These resource managers allow applications to enqueue and dequeue requests for information. Publish/subscribe services allow applications to publish and subscribe to notifications.
In such systems, transactions access both an SQL database system (which may be considered to be a type of resource manager) as well as resource managers for queuing and/or publish/subscribe services. As a result, the SQL database system and the other resource managers have to participate together in an expensive two phase commit protocol. Due to the lack of integration between the resource managers and the database system, the SQL compiler cannot optimize access to both the data stored by the queuing or publish/subscribe resource manager and the data stored in the database, and cannot perform joins across the different data sources.
To avoid subjecting applications to the two-phase commit protocol, some SQL database vendors (e.g., Oracle and Sybase) have integrated transactional queuing and publish/subscribe services into their database products. While their implementations remove the need for a two-phase commit protocol, these implementations use special purpose objects for queues and publication channels. These implementations prevent queues and publication channels from being accessed as part of SQL statements, unless the user is willing to forgo important characteristic properties of queue or notification channels, particularly suspending execution when all qualifying tuples have been returned and automatically resuming execution when new tuples become available. It also effectively prevents the SQL compiler in these systems from optimizing access to notifications and SQL data.
The present invention provides transactional queuing and publish/subscribe extensions to an SQL database infrastructure. These extensions do not introduce any special objects. Applications access regular SQL database tables, including tables used as queues or publication channels. Applications use SQL select statements to subscribe and/or dequeue notifications. Furthermore, applications use SQL insert and update statements to publish notifications. These extensions remove the need for a two-phase commit protocol, allow applications to perform join operations on data from different sources, and allow the SQL compiler to optimize access to both notifications and "normal" SQL data, using previously developed query optimization techniques. Further, execution of an SQL statement that accesses a data stream is blocked when all qualifying tuples have been processed and then resumes once new notifications become available.
As indicated above, the present invention extends traditional relational database systems to perform operations on a continuous stream of tuples, while retaining the traditional benefits of such systems, such as set oriented access using relational operators, transactional protection, and so on. The stream of tuples may represent queue entries, or notifications.
A simple example will assist in understanding the difference between traditional and streaming queries. A traditional query for viewing or otherwise processing records relating to sales by a particular salesperson might be: EQU select*from SalesInfo where salesperson="Johnson, Mary".
This query would retrieve all records from the table named "SalesInfo" having a value of "Johnson, Mary" in the salesperson field (column). Only records in the table at the time the query is submitted will be reported by this query.
However, if the user wants the system to continuously monitor new sales entries for a particular salesperson, the user of a traditional relational database system cannot define a simple SQL statement to perform that function, since monitoring features are not an integrated part of any of the widely used "dialects" of SQL. In contrast to that, the present invention allows users to issue an SQL select statement that performs this function: EQU select*from stream(SalesInfo) where salesperson="Johnson, Mary";
Using the present invention, when an SQL statement specifies stream access, the execution of the SQL statement never returns an "end of data" indicator. Rather, when all qualifying tuples in the specified table have been processed, continued execution is blocked, and then resumes when more qualifying tuples become available.