As automated business processes, such as Web services and online transactions, become ubiquitous, unprecedented volumes of business events are continuously generated and recorded as event streams. Complex Event Processing (CEP), which aims to detect interesting event patterns in event streams, is gaining adoption by enterprises for quick detection and reaction to critical business situations. Common CEP applications include business activity monitoring, supply chain management, and anomaly detection. Major database vendors have recently taken significant efforts in building event-driven architectures.
The event patterns in CEP specify complex temporal and logical relationships among events. Consider the example event pattern EP1 below, in which “->” represents the temporal relationship between two events and [totalPrice>200] is the predicate on the GenerateQuote event. This pattern monitors the cancelled orders that involve the participation of both suppliers and remote stocks, with quote's price>$200. Frequent occurrences of such patterns may indicate, e.g., the need for an immediate inventory management.
Event Pattern EP1:
((OrderFromSupplier->GenerateQuote[totalPrice>200])^(UseRemoteStock->GenerateInvoice))->CancelOrder
State-of-the-art CEP systems employ automata for event pattern matching. When there are large numbers of concurrent business processes, many partial query matches may be kept in automata states. Events arriving later need to be evaluated against all these partial matches to produce query results. Also, event streams tend to be high-speed and potentially infinite. To provide real-time responses, as often required by applications to take prompt actions, serious challenges in CPU/memory utilizations are faced by CEP.
One important class of event queries is called alert queries. Alert queries correspond to key tasks in business activity monitoring, including detection of shoplifting, or large/suspicious financial transactions, or other undue business actions like orders cancelled for certain reasons (see example above). These queries detect exceptional cases to the normal business flows and are thus expected to be highly selective. Keeping large numbers of partial matches that do not lead to any query results can cause a major drain on available system resources.
Typically, many business events do not occur randomly. Instead they follow pre-defined business logic or rules, such as a workflow model. Such CEP applications include                Business activity monitoring: an online retailer may want to detect the anomalies from its order processing transactions. In this case, the events are generated from a BPEL workflow engine, a business rule engine or simply a customized program.        Manufacturing monitoring: a manufacturer may want to monitor its stream-line production process. The process events correspond to pre-defined procedures.        ClickStream analysis: a shopping website may want to monitor the click stream to discover the user navigation pattern. Here the user click events depend on how the website is structured.        
As consequence, various constraints may exist among events in these CEP applications. In particular, occurrence constraints, such as mutually exclusive events, and order constraints, such as one event must occur prior to the other event, can be observed in all the applications listed above. The majority of the software design patterns exhibit such constraints as well.
The availability of these constraints enables us to predict the non-occurrences of future events from the observed events. Such predictions would help identify which partial query matches will not lead to final results. Further efforts in maintaining and evaluating these partial matches can be prevented. Example below illustrates such optimization opportunities that remain unexplored.
Example 1 Assume the event stream is generated by the online order transactions that follow the workflow in FIG. 1. Each task in the workflow, if performed, submits an event to the event stream. Both occurrence and order constraints can be inferred from this workflow. For example, the UseLocalStock and the UseRemoteStock events are mutually exclusive. Also, any GenerateQuote event, if it occurs, must be before the SendQuote event in a transaction.
Consider the example event pattern EP1 again. By exploiting the event constraints, whenever a UseLocalStock event occurs, this transaction is guaranteed to not match the query because the UseRemoteStock event will never occur in this transaction. Also, once a SendQuote event is seen in a transaction, and no GenerateQuote event with totalPrice>200 has been observed so far, the transaction will not match the query because no GenerateQuote event will happen after the SendQuote event. In either case, any partial matches by these transactions need not be maintained and evaluated further as they are guaranteed to never lead to a final result. If the query processing of large numbers of transactions could be terminated early, a significant amount of CPU and memory resources would be saved.
Several observations can be made from the above example. First, although the event constraints are known at query compilation time, the real optimization opportunities only emerge at runtime, based on the partial workflow executed so far (i.e., what events have been observed). For example, although the UseLocalStock and the UseRemoteStock events are known to be exclusive, only when one of them occurs, the other one will not be seen in the same transaction. Second, both occurrence and order constraints can be exploited to short-cut query execution.
As event processing gains popularity in many applications, an increasing effort has been devoted in developing efficient event processing systems. The existing work include streaming databases such as HiFi that support SQL-style queries, pub/sub systems that support simple filtering queries, and CEP systems such as SNOOP, Amit, CEDR, Cayuga and SASE, that support event pattern queries expressed by more powerful languages. These works focus on query model/language design and query algebra development. None of these works considers exploiting the common event constraints.
Semantic query optimization (SQO), i.e., using schema knowledge to optimize queries, has been extensively studied for traditional databases. Major techniques focus on optimizing value-based filtering or matching operations, including join and predicate elimination and introduction. They remain applicable in CEP for identifying efficient query plans at compilation time. These existing SQO techniques are mainly designed for static query optimization. They are inappropriate for runtime use. SQO has also been studied for optimizing queries over streaming XML documents. In CEP, event data from possibly thousands or millions of concurrent processes can be interleaved, and thus huge numbers of potential partial matches (one for each process) at runtime. Also, more types of constraints can be observed in business processes than in XML schema. All these pose stringent requirements on scalability, generality and extensibility on exploiting constraints in CEP. The work is also related to punctuation. The existing works on punctuation mainly focus on utilizing punctuations to reduce the memory usage of SQL-type of stream query. Punctuations (effective dynamic constraints) from event constraints are used to reduce both CPU and memory cost for CEP queries.
Other related areas include workflow management since the event constraints are extracted from the workflows. The existing work on workflow management focuses on two problems, workflow analysis and workflow verification. Workflow analysis involves the soundness proof of a workflow and the identification of critical activities in a workflow. Workflow verification deals with the following problem. Given a finite set S of dependencies, check whether there is a workflow execution (or all executions) satisfying all the dependencies in S. The exploitation of the order constraints relates to the work on temporal reasoning, i.e., to detect whether a cycle exists among the order constraints in query and in event data. However, the existing works on temporal reasoning focus on the language specification and enforcement instead of utilizing temporal constraints to optimize queries.