1. Field of the Invention
This invention relates generally to continuous processing systems that process streaming data, and, more specifically, to synchronizing message processing in a continuous processing system.
2. Description of the Background Art
A continuous processing system processes streaming data. It includes statements (such as queries), written by programmers, which operate continuously on input data streams and which publish to output data streams. In such system, it is often difficult to achieve predictable and repeatable output results.
When statements written by programmers are compiled, an execution graph may be created, where the execution graph is comprised of connected primitives that correspond to the compiled statements. An execution graph in a continuous processing system specifies the path for processing messages in accordance with the statements. In other words, the continuous processing system processes messages by pushing them through the execution graph.
In order to achieve predicable and repeatable output results, messages have to be processed in accordance with message order rules (i.e., rules that specify the order in which messages need to be processed). For example, in one embodiment, messages are assigned an internal timestamp and are processed in order of their timestamp, where messages with the same timestamp are processed together.
Certain types of primitives in an execution graph may have potential for substantial delay. Examples of such primitives include primitives that make database calls or remote procedure calls, as well as primitives associated with user-defined functions.
For efficiency and speed purposes, it is often desirable to process messages in parallel, which means that more than one row from a data stream may enter the execution graph at a particular time. With parallel processing, a primitive may process multiple messages with different timestamps at the same time. For example, a primitive that joins data in a message with data in a database (a “database joiner primitive” or “DB Joiner”) may make concurrent database calls for multiple joins at once.
If parallel processing occurs in a graph that has primitives with potential for substantial delay, messages can easily get processed out of order, which means that the output results will not be predictable and repeatable. FIG. 1 illustrates an example of this. In this example, messages from data stream 110 go into both database joiner 120 and joiner 130. Message order rules for this example system require that a message with timestamp x be joined at joiner 130 with output of database joiner 120 for a message with timestamp x (i.e., a message with the same timestamp). If only one message (i.e., one row) from data stream 110 goes into the execution graph at a time, the message order is easily preserved. However, if multiple messages enter the graph at once, messages could be processed out of order if one primitive is slower than the other primitive. In this example, messages with timestamps 1-5 go into the graph 100 and the database joiner 120 is delayed, thereby causing the output of the database joiner 120 for message 1 to be joined with message 5 in stream 110 instead of message 1 (where the correct result would be to join the output of the database joiner for message 1 with message 1).
Therefore, in an execution graph where there are primitives with the potential for substantial delay, there is a need for a system and method that permits parallel processing within some areas of the execution graph while protecting other areas of the graph from processing messages out of order due to primitives with potential for substantial delay.