In some dataflows, a given action can have multiple executions during the dataflow, with various dependent transformations. To improve the performance of such dataflows, some dataflow engines provide mechanisms to persist the output of a transformation using a caching operation, thereby avoiding the re-execution of precedent operations. The caching operation indicates that the dataset produced by an operation should be kept in memory for future reuse, without the need for re-computation.
The use of a caching operation potentially avoids the increased cost incurred by multiple actions in a dataflow. In the case of real-time dataflow executions, however, identifying the datasets to cache as the dataflow progresses is not trivial.
A need therefore exists for techniques for dynamic placement of cache operations during the execution of such dataflows.