1. Field of the Invention
The present invention pertains generally to computer languages, and in particular to descriptions for queries that operate upon data stream data continuously.
2. Description of the Related Art
CEP Applications and their Requirements
Software applications are needed to address a growing set of problems arising in such diverse areas as:                Finance (program trading and execution, risk management, pricing, fraud management)        Network Management (security monitoring and response, network and application monitoring, SLA monitoring)        Business Process Management (process monitoring, exception management, scheduling)        Sensor Networks (RFID apps, manufacturing process monitoring, power gird monitoring, military)        
These applications, sometimes referred to as Complex Event Processing (CEP) applications, have a number of requirements that are difficult to meet using conventional tools. CEP applications must:                1. Support arbitrarily-complex computation (for example, expressions, filtering, windows, aggregations, joins between multiple data sources, correlations of real-time and/or historical data, pattern matching, and so on)        2. Process large volumes of messages at high incoming message rates (from 1,000 to 100,000 messages per second, or more)        3. Exhibit very low processing latency (from milliseconds to seconds)        4. Be scalable and reliable        5. Be easy to build, modify, and maintain        
Until recently, there have been two ways to build CEP applications:                Use a relational database (possibly of an in-memory variety)        Code a custom “black box” solution        
Unfortunately, both approaches create major problems, discussed below.
Using Relational Databases for CEP Applications
Relational databases have been around for a while. The now-standard relational data model and Standard Query Language (SQL) are optimally designed for writing traditional (non-CEP) applications. These traditional applications are characterized by the following:                Most of the data held in the database is fairly static.        Updates and/or complex queries are comparatively infrequent.        
This is not true for CEP applications, however. As the volume of incoming messages, events, and updates goes up, and the increasing demands of business require much more frequent complex analysis, conventional database solutions begin to break down. More and more query requests are sent to the database, and the database becomes a bottleneck. This is not surprising, given the limitations of database technology:                A database stores everything on disk and is optimized accordingly. (In-memory databases help here, but they suffer from other problems.)        A database is hard to optimize for both rapid push AND pull of data (push/pull conflict).        A database is not designed for continuous processing. If you want to know an answer to a query ten times a second, you must issue the query ten times a second. This solution cannot be applied to hundreds or thousands of queries.        
Traditional databases offer a feature called triggers, which, in theory, enables the database to respond to new data being inserted into the table. Unfortunately, all modern databases implement triggers in a uniformly unscalable and unmanageable way, as triggers were an afterthought in database design. Building complex logic in triggers is difficult or impossible, and trigger performance can be quite poor.
For these reasons, databases are rarely used for high-volume low-latency CEP applications—databases just do not scale.
Building Custom CEP Applications
Custom applications alleviate many of database problems, but they create a large number of new ones. Custom applications (also known as black boxes) start simple, as the initial requirements are typically very limited. Many begin with simple filtering or aggregation. Problems increase quickly, however, as windows, complex aggregations, correlation, pattern matching, and other levels of complexity are added. Despite the promise of a “custom solution”, performance rapidly becomes a problem. Providing enterprise features such as scalability, clustering and high-availability while developing, extending and maintaining custom CEP Applications is notoriously difficult and time-consuming.
Some custom applications are written on top of a messaging system, or a message bus. Unfortunately, message buses solve mainly transport-level problems, such as asynchronous delivery, publish/subscribe multicast and guaranteed delivery. Other than performing basic filtering, message busses offer no support for any complex computation, correlation or pattern matching. All of these tasks must still be implemented in a custom application.
Continuous Processing Languages
In view of the above, there is a need for a general-purpose language for creating CEP applications. More specifically, there is a need for a general-purpose language for expressing registered queries that operate on data streams continuously.
In the last few years, the Stanford STREAM project has researched and published papers describing a Continuous Query Language (CQL) that is SQL-like at its core. However, there are several disadvantages with STREAM:                STREAM is not based on a publish/subscribe model and therefore is difficult to scale.        With STREAM, queries do not operate directly on data streams, which makes STREAM complex. STREAM requires use of 3 types of operators, stream-to-relation, relation-to-relation, and relation-to-stream, to process data stream data and produce an output data stream. This results in an overly complex object model        STREAM does not have a simple clause for expressing a pattern. Detecting patterns can be a key part of a CEP application.        STREAM has neither the ability to express a query that correlates data from a database table to a data stream, nor the ability to express a query that writes data from a data stream to a database table.        STREAM does not have a clause for defining a named window by data stream rows and/or time, where the defined window can be used in one or more queries.        
More information about STREAM can be found in the “Description of the Related Art” section of the provisional application Ser. No. 60/650,198 titled “Continuous Processing Language for Real-time Data Streams” which is referenced in the “Related Applications” section above.
Berkeley's TelegraphCQ project also has researched and published papers describing a Continuous query system which employs modifications on the OpenSource PostgreSQL Database. In addition, Brown University has an Aurora data stream Management System project. Both Berkeley's and Brown's system generally suffer from the same disadvantages as described with respect to STREAM. More information about the Berkeley and Brown systems can be found in provisional application Ser. No. 60/650,198 titled “Continuous Processing Language for Real-time Data Streams” which is referenced in the “Related Applications” section above.