The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Data communication networks may comprise a hierarchy of “nodes” that receive, process, and send data streams to each other. A data stream may be an unbounded set of data sent over time. For example, a “source” node may send an unbounded set of data (“source” data) to a “processing” node. The processing node may process the source data as it is received, generate a new, unbounded set of “derived” data based on the source data, and send the derived data to a downstream node. The hierarchy may be referred to as a “workflow” or “streaming workflow”.
A streaming workflow may have many advantages. For example, in a streaming workflow, each node may generate and send derived data based on source data, as the source data is received, thus reducing latency.
Building a robust and recoverable streaming workflow, however, may be challenging. For example, a processing node may receive source data that is late or out of order. A source node may send some partial amount of source data to a processing node before terminating the source stream unexpectedly. Accordingly, the processing node may generate and send derived data based on the out of order or incomplete source data.
One possible approach is to cause the processing node to delay processing the source data until the source data is determined to be complete. Then, the processing node may reorder data as needed and process the data. However, that approach may increase latency to unacceptable levels. For example, a node may process time-critical data, such as networking, power, or rocket telemetry data. If a processing node is forced to wait to receive additional source data from a source node, a downstream processing node or device may be unable to perform its functions properly or may derive misleading data from the lack of data received for the suspended node. For example, the downstream node or device may determine that a network is down or a rocket has lost contact, however the source data not yet sent may indicate that a network has excellent connectivity with one or more other networks or that a rocket is performing within mission parameters. Furthermore, the processing node may still receive exceptionally late source data, which cannot be incorporated into the derived data already generated and sent.
Errors earlier in a streaming workflow may drastically affect data generated later in the streaming workflow. For example, if a first processing node receives source data after generating and sending the derived data downstream, then the derived data may be inaccurate or false. Further downstream-processing nodes may generate additional derived data based on the inaccurate or false derived data generated upstream. The additional derived data may therefore also be inaccurate or false. Thus, gracefully and efficiently processing data in a streaming workflow is valuable.
While each of the drawing figures illustrates a particular embodiment for purposes of illustrating a clear example, other embodiments may omit, add to, reorder, and/or modify any of the elements shown in the drawing figures. For purposes of illustrating clear examples, one or more figures may be described with reference to one or more other figures, but using the particular arrangement illustrated in the one or more other figures is not required in other embodiments.