Stream processing applications have emerged as a paradigm for analyzing streaming data (e.g., audio, video, sensor readings, and business data) in real time. Stream processing applications are typically built as data-flow graphs comprising interconnected stream operators that implement analytics over the incoming data streams. Each of these operators is a component.
During operation of a stream processing application, a stream operator may fail (i.e., stop executing its operations or responding to other operators) for any one or more of several reasons, including, but not limited to: a heisenbug (i.e., a computer bug that disappears or alters its characteristics when an attempt is made to study it) in the stream operator code (e.g., a timing error), a node failure (e.g., a power outage), a kernel failure (e.g., a device driver crashes and forces a machine reboot), a transient hardware failure (e.g., a memory error corrupts an application variable and causes the stream processing application to crash), or a network failure (e.g., the network cable gets disconnected, and no other node can send data to the operator).
Many data stream processing applications, in the form of one and more operators connected via data streams, maintain large state in memory (such as sliding windows or bloom filters) in order to perform various analytics (such as sorting, aggregation, and join). For fault tolerance purposes, a data stream processing application may need to periodically checkpoint its state to a persistent storage (termed “checkpoint data store”) so that, in case of a failure, the application can recover its state from saved checkpoint and resume normal operations.
Unfortunately, checkpointing a large operator state can incur significant overheads to the stream processing application and the checkpoint data store. The standard approach of checkpointing an operator state is to serialize all the operator state data and store the serialized data onto the checkpoint data store. For an operator with a large state, the application needs to spend substantial amounts of time in serializing and writing the state to the checkpoint data store, which stalls normal processing. Furthermore, the checkpointed large state data usually consume a huge storage space and I/O bandwidth of the checkpoint data store.
However, in many stream processing applications, the amount of changes of an operator state between two consecutive checkpoints is usually much smaller than the total operator state size. In this case, it would be more efficient to checkpoint only the changed portion of the large operator state rather than the whole state. Hence, a need is recognized to devise an incremental checkpointing method for stream processing applications.
While prior existing incremental checkpointing schemes exist, including paging-based approaches, pre-copying based approaches and hash-based approaches, all of these approaches more or less address the recording of changes to application state during normal computation and checkpointing the logged changes as a delta checkpoint. Some of them checkpoint the whole application address apace (e.g., by detecting and tracking dirty pages in application address space, and saving the dirty pages as delta checkpoint), and are inappropriate for checkpointing an operator state which is only a part of application address space. Most paging and pre-copy approaches require modification to Operating System or Virtual Machine Monitor or installing kernel modules, which may not be feasible in practice. Besides, prior incremental checkpointing approaches largely ignores the restoration cost and can degrade the restoration time arbitrarily; this is inappropriate for data stream processing applications, as long restoration time may not only lead to unacceptable delay in stream processing, but also overwhelm the data sources and intermediate buffers and cause data loss.