Systems and methods herein generally relate to automated workflows, such a document production and document processing workflows, and to making workflows run more efficiently.
In automated document workflows, sets of documents are processed according to the steps described in a workflow, typically with little or no user intervention. A job may include a number of documents that should be processed together according to the rules described in a workflow. A workflow can be represented as a graph with nodes representing tasks and directed edges representing transitions between tasks. In this disclosure, the words “task” and “node” are used interchangeably. In a simple job execution all documents in a job would transition together from one task to the next beginning at the starting node and finishing at one terminal node. In some situations a job should be partitioned into two or more independent pieces of work that, for the purpose of this disclosure, are called sub-jobs such that each one of these sub-jobs could follow an independent path of execution and possibly have, at any point in time, a different active task. Sub-jobs are useful when different parts of a job require different type of processing. Sub-jobs also allow for the parallel execution of activities.
To support sub-jobs, workflow definition languages include a special task that enables the creation of a number of sub-jobs from a given job/sub-job and another special task that combines one or more sub-jobs into a single sub-job/job. In the context of this disclosure, those special tasks are called “split” and “join” respectively. The join task affects a number of executing sub-jobs. These executing sub-jobs, upon their arrival to the join task, are combined in a single sub-job. The difficulty in implementing the join task is that it is hard to know which executing sub-jobs in the system are ever going to arrive to that particular join task. So, when the first sub-job arrives at the join task, the system should decide if needs to wait for other sub-jobs or continue, if it decides to wait, a similar decision should be made when a second sub-job arrives to the task. A decision should be made regarding whether the system should wait for an additional sub jobs, or combine all sub-jobs already at the join task into a single sub-job and continue. Given that the workflow processing engine is an abstraction of a general computation system, the general problem is, by definition, undecidable. Nevertheless, for practical purposes a solution could be useful in the narrow context of document workflows.
One desirable characteristic of a solution for this is that a solution should be repeatable in that the solution should only depend on the input documents and the transformations described in the workflow. The solution should not depend on, for example, how long it takes to process a particular sub-job at a particular task or set of tasks. Another desirable characteristic is that the specification of a join task should not constrain significantly the types of workflows that can be processed by the system.
In general workflow systems, constructs similar to the join task described above are available. Some systems can specify an activity whose execution only starts when a number of incoming execution flows reach that activity. However, such systems should know in advance how many of these flows will actually contain sub-jobs at run time and these do not account for the case that multiple sub-jobs may follow the same path. For example, a document workflow may split a job into potentially five different branches requiring different processing. At run time, the documents of the job may be split so that three documents travel through one branch, two documents travel through another branch, and no documents go through any of the other branches. Only when all of these five documents reach the join node should the workflow be allowed to continue.
Other workflow systems pair split and join activities. In such systems, all the control flows that are created at a given split are collapsed at the corresponding join. This approach significantly limits the number of document workflows that can be processed. For example, a workflow may start with two input documents requiring different initial pre-press steps. At some point, each one of these documents go into a split node that divides each document into a color and a grayscale portion. After some more processing (for example color correction) it may be beneficial to join the two color sub-jobs (and/or the two grayscale sub-jobs) to impose them together before printing. However, in this scenario, the all sub-jobs that contain the color portions were not created in the same split, so this particular workflow could not be modeled using the paired splits and joins.