A workflow engine is a software system used to run a multitude of tasks. Each task is typically the invocation of an executable program. These tasks may have precedence relationships between them. Thus, the workflow may be looked upon as a graph, where each node represents a task to be performed, and an edge represents a precedence relationship. In a majority of cases, the workflow task graph can be acyclic.
Typically, users run a set of workflows at one go. In this set, users can create copies of a workflow for different inputs. Secondly, users can also create workflows by changing existing workflows slightly. As such, it would be a sub-optimal choice to consider each workflow as a separate entity. To minimize computation in the workflow set and take advantage of the common structure, it would be desirable to merge all of the workflow directed acyclic graphs (DAGs) into a single graph.
Existing approaches disadvantageously do not take dynamic information into account. It is possible that two nodes have different parents, but do the same computation. It is also possible that two nodes do different computations, but use the same software. Hence, it would be desirable to run such nodes on the same machine, for example, to take advantage of caching.