Computational tasks in a workflow are often distributed among a plurality of computing machines in order to gain efficiency. However, this distribution of tasks can be problematic when the tasks are not completely independent of one another. This means that the tasks cannot simply be divided up and distributed to be performed in parallel. In some cases, the resulting outputs of tasks of a single type may need to be merged or combined in order to allow successive tasks in the workflow to be performed.
Furthermore, different types of tasks in the workflow may take up varying amount of computing resources. There may be even greater variation in the required amount of computing resources depending on the underlying data operated on in those tasks. The different machines may also have their own varying levels of resource utilization. All of these factors may make it difficult to distribute tasks among the plurality of computing machines in order to achieve even resource utilization across those machines.
Task processing can be further complicated when performance of a first task produces multiple, conflicting results. When a second task depends on a result of the first task, this can slow a workflow and/or hamper a quality and/or accuracy. For example, one potential response to detecting multiple, conflicting results is to trigger a task to re-verify input data for the first task. This approach then requires a verification task to be performed and potentially for the first task to be repeated before the second task can be performed, thereby introducing a delay. Another approach is to discard the results, though this approach can degrade a quality of a result produced by the second task.
There exists a need for techniques to allow for the processing of computational tasks in a workflow to be distributed across a plurality of computing machines in a manner that accounts for the type of task, the data operated on in those tasks, and the varying resource utilization of the machines. There is a further need for techniques to address conflicting results produced in early tasks in a workflow.