1. Technical Field
This disclosure generally relates to streaming applications, and more specifically relates to restoration of consistent regions of streaming applications in a streaming environment.
2. Background Art
Streaming applications are becoming more common due to the high performance that can be achieved from near real-time processing of streaming data. A streaming application is organized as a data flow graph consisting of multiple operators connected via stream connections that each process streaming data in near real-time. An operator typically takes in streaming data in the form of data tuples, operates on the tuples in some fashion, and outputs the processed tuples to the next operator in the flow graph. A subgraph is a portion of the flow graph of the application.
Because of business requirements, some applications require that all tuples in an application stream are processed at least once. A consistent region can be defined in streams processing to meet the requirements for at-least-once processing. A consistent region is a subgraph where the states of the operators become consistent by processing all the tuples within defined points on a stream. The consistent state comprises of a collection of persisted operator states that are consistent as having processed all tuples up to a certain logical point. On a failure, a consistent region is reset to its last successfully persisted state, and source/start operators of a region can replay any tuples submitted since the last persisted state. The replay enables applications to achieve at-least-once tuple processing.
Thus, at-least-once tuple processing includes the ability for different parts of the application to recover state and restart processing tuples. The ability to recover state and replay tuples enables the application to recover from failures. A consistent region thus identifies a subgraph of the application that can recover upon a failure if required. The potential downside to a recovery is that the upstream operators may backup with pending data or downstream operators may be flooded with replayed data. In addition, resources utilized by an application region during restore may become overwhelmed. This is apparent when a failure affects multiple consistent regions at the same time (e.g., a host failure) and the regions rely on a common resource for processing (e.g., a shared database) or a common node.