Planning execution of different tasks (i.e., the control of their flow over time in order to achieve a desired object) is a commonplace activity in a number of applications. A typical example is a workload scheduler, or simply scheduler, which is used in a data-processing system to control the execution of large quantities of data-processing jobs, or simply jobs (i.e., any work unit suitable to be executed thereon).
For this purpose, the scheduler creates a workload plan, or simply a plan (establishing the flow of execution of the jobs); particularly, the plan is created by arranging the jobs in succession according to a desired execution time thereof.
The execution of some jobs may also be constrained by one or more dependencies on other jobs (for example, when the jobs process the results of the other jobs). Generally, each dependency indicates a job (referred to as predecessor job), whose execution enables the execution of another job (referred to as successor job). Moreover, the dependency may also indicate a dependency time (for example, a dependency timeframe), which further requires that the execution time of the predecessor job meets the dependency timeframe to enable the execution of the successor job (for example, when the execution time of the predecessor job falls within the dependency timeframe); the dependency timeframe may be specified either in absolute terms (for example, a fixed time interval) or in relative terms with respect to the execution time of the successor job (for example, a time interval around it).
The plan may comprise the execution of multiple instances of some jobs of the same type. Therefore, each dependency is to be resolved by identifying the actual predecessor job to be linked to the successor job, by selecting it among all the eligible jobs of the required type having the execution time within the dependency timeframe. The selection of the predecessor job is performed according to pre-defined rules. For example, the predecessor job may be selected so as to have the execution time that is closest to the execution time of the successor job before it (comprised when they are the same); if this is not possible, the predecessor job is selected so as to have the execution time that is closest to the execution time of the successor job after it.
The resolution of each dependency then involves the searching of the corresponding predecessor job throughout all the jobs (for example, in the plan when a new job is to be added thereto). However, this requires several scanning of the jobs backwards and forwards. Therefore, the resolution of the dependencies may become quite long, especially when a very high number of jobs are to be executed (for example, several thousands in complex production environments), with a corresponding increase of the time required for creating the plan. All of the above has a detrimental effect on the performance of the scheduler, and then of the whole data-processing system.
Several techniques have been proposed in the art for improving the performance of a scheduler, for example, as disclosed in the following documents (the entire disclosures of which are herein incorporated by reference).
Particularly, U.S. Pat. No. 8,392,397 discloses a selection of provenance dependency function, US-A-2011/0246998 discloses a reorganization of tasks to achieve optimization of resources, US-A-2009/0241117 discloses an integration of flow orchestration and scheduling, US-A-2012/0084789 discloses an aggregation topology for optimizing task graphs, U.S. Pat. No. 7,661,090 discloses an optimization of operation sequence to allow manual operations to be combined when feasible, US-A-2009/0013322 discloses a scheduler allowing task overruns, US-A-2005/0149927 discloses a determination of executability of tasks based on required resource amount, U.S. Pat. No. 8,296,170 discloses a management of deadlines for completing activities, and “Optimizing Grid-Based Workflow Execution by Gurmeet Singh, Carl Kesselman and Ewa Deelman, Journal of Grid Computing (2006)” discloses an application of abstraction refinement to large-scale static scheduling problem.
However, the planning techniques known in the art are not completely satisfactory, especially with respect to the resolution of the dependencies.