This description relates to recoverable stream processing.
Some data processing programs receive a batch of data to be processed (e.g., one or more files or database tables), and the amount of time needed for processing that data is well-defined since it is based on the amount of data in the batch. This type of processing is called “batch processing.” Some data processing programs receive one or more streams of data that are processed for a potentially unknown amount of time since the streams may include an unknown or arbitrary number of data units, or a potentially continuous flow of data units. This type of processing is called “stream processing” or “continuous flow processing.” The factors that are relevant to providing recoverability in data processing systems can depend on the type of processing being used, as well as other characteristics such as whether or not there are multiple interacting data processing programs, and whether or not the order of processing data units is deterministic.