1. Technical Field
This disclosure generally relates to streaming applications, and more specifically relates to checkpointing streaming applications that have one or more tuple windows.
2. Background Art
Streaming applications are known in the art, and typically include multiple processing elements coupled together in a flow graph that process streaming data in near real-time. A processing element typically takes in streaming data in the form of data tuples, operates on the data tuples in some fashion, and outputs the processed data tuples to the next processing element. Streaming applications are becoming more common due to the high performance that can be achieved from near real-time processing of streaming data.
Checkpointing is well-known in the art of computer programs as the process of saving the state of a computer program periodically as it runs so the state of the computer program can be restored should a failure occur. Checkpointing a streaming application brings some additional challenges, because the performance cost of checkpointing can often negatively affect the performance of a streaming application, which typically needs to process a data stream in near real-time. It is known in the art of streaming applications to checkpoint periodically, meaning a full checkpoint is taken at set time intervals. Thus, if the selected time interval is 30 seconds, this means every 30 seconds the streaming application will create a checkpoint of the state of the processing elements in the flow graph.
Because checkpointing can negatively affect the performance of streaming applications, incremental checkpoints have been developed that allow taking a full checkpoint, followed by multiple “delta checkpoints” that reflect the changes since the last checkpoint, whether a full checkpoint or another delta checkpoint. Delta checkpoints are typically much smaller than full checkpoints, which affects the performance of the streaming application less than when full checkpoints are taken at each periodic interval.
Some streaming applications have tuple windows. Checkpointing applications that have tuple windows can result in significant overhead in taking a checkpoint when the tuple window size is large, meaning many tuples are within the tuple window. Forcing a checkpoint at a periodic time period could result in having to checkpoint many tuples in a tuple window, which will negatively impact performance of the streaming application.