Big data dataflows usually correspond to multiple sequences of complex operations that execute transformations over large distributed datasets. When an infrastructure is created for such dataflows, its resources should be sufficient to deal with peaks of demand. Thus, resources are often idle for periods of time. When an operation that will be executed in the future can be pre-processed, taking advantage of such idle resources, there is a potential benefit for the users and the infrastructure management.
U.S. patent application Ser. No. 15/191,755, filed Jun. 24, 2016, entitled “Methods and Appartus for Data Pre-Processing in Dataflow Operations,” incorporated by reference herein, discloses a method to leverage idle resources to preemptively execute activities that are likely to be necessary in the future. A need exists for techniques for managing which operations should be preemptively executed, before being explicitly started by a user.