Software workflows, such as test automation, batch executions, and the like, face challenges of robustness. Such workflows may execute many software components, each of which may depend on the result(s) of one or more previously executed components. Moreover, each component may take a long time to execute. As such, an error in one of the components may cause the workflow to finish prematurely, without returning the expected result. Sometimes the error is transitory, in which case re-running the workflow may be enough for the error to disappear. In other scenarios, the error represents a programming defect that must be investigated by a software engineer. In order to reproduce the error, the software engineer must re-run the workflow in order to re-create the state in which the error occurred. In both scenarios, re-running the workflow is costly in time and computing resources.
Therefore, there is a need for an improved framework that addresses the abovementioned challenges.