Scheduling of different work units (for example, batch jobs) is a commonplace activity in data processing systems. For this purpose, workload schedulers are available to automate the submission of large quantities of jobs according to a specific plan (for example, based on constraints defined by the completion of other jobs); an example of commercial scheduler is the “IBM Tivoli Workload Scheduler (TWS)” by IBM Corporation“.
Typically, the scheduler controls the submission of the jobs on a set of execution servers from a central scheduling server. This allows implementing systems to be very powerful and easily scaleable. Moreover, the same structure ensures high reliability (since the plan can be run even in case of failure of one or more execution servers). Workload-balancing techniques may also be exploited to optimize the distribution of the jobs among the execution servers.
In this case, the scheduler selects the execution servers for the jobs dynamically when they are submitted for execution. For this purpose, each job specifies any hardware and/or software resources that are required for its execution (such as microprocessors, RAM, operating systems, software applications, databases, and the like); the scheduler determines the execution servers having the required resources and then selects one of them for executing the job (for example, according to their current workloads).
Document US-A-2006/0080666 (the entire disclosure of which is herein incorporated by reference) proposes a solution for managing more complex environments. In this case, each job specifies the required resources by means of formal definitions based on their properties. Moreover, the description of the job may include the specification of relationships that must be satisfied with other resources; those relationships are in turn specified by means of formal definitions based on the properties of the other resources. For example, it is possible to indicate that the job must be submitted on an execution server having an operating system of a specific type and accessing another computer that runs a specific application.
The identification of the actual resources that satisfy the above-mentioned conditions is quite complex. For this purpose, the cited document proposes an iterative method. Particularly, at first a set of eligible resources possessing the desired properties is selected. A loop is then performed to reduce this set by removing the eligible resources that cannot satisfy all the relationships (until no further eligible resource is removed).
However, the above-described iterative method is very time consuming. This has a detrimental effect on the performance of the scheduler.
As an extreme case, the application of the proposed solution may even be not feasible in specific environments (with very strict time constraints).
Moreover, the relatively long time required by the iterative method does not allow ensuring the consistency of the information that is used during the whole process (for example, because the resources have changed in the meanwhile). Therefore, it is impossible to guarantee the integrity of the results being obtained.
This can cause errors in the executions of the jobs (for example, when some jobs are submitted on execution servers that do not have the required resources actually). This is very serious when the error affects a critical job (on which many other jobs depend for their execution); as a result, the execution of the whole plan (or at least of a substantial part thereof) may be blocked.
All of the above hinders the widespread application of the proposed solution; this drawback is more acute in environments where its implementation would be particularly useful (for example, in distributed systems—especially based on the Internet).