Job scheduling and workload balancing among a plurality of resources connected with a network is an increasingly important component of an IT environment. Many grid computing environments are driven by the scheduling of work across a distributed set of resources (e.g., computation, storage, communication capacity, software licenses, special equipment, etc.). In essence, scheduling is an optimization problem, which is fairly straightforward when only one resource type is involved. However, whilst further performance improvements can be achieved by including more resource variables in the scheduling process, the resulting multivariate optimization becomes a difficult mathematics problem.
State of the art job scheduling systems normally employ a master/agent architecture, wherein jobs are set up, scheduled and administered from a central server (known as a “master” server). The actual work is done by agents installed on the other servers. In use, the master maintains and interprets information relating to the jobs, available servers etc., so as to decide where to assign jobs. The agents, in turn, await commands from the master, execute the commands, and return an exit code to the master. While the master/agent architecture allows tight control over jobs, the need for the master and agents to remain synchronised (and corresponding dependency on the availability of the network and the master) is a serious limitation of the architecture. In a related manner, the highly-centralized nature of network traffic between the master and agents can degrade the overall performance of the architecture. Another problem is the limited scalability of the master/agent architecture. In particular, a master can support only a limited number of agents and creating a new master or instance creates a new and separate administration, so that the more instances created, the more complex administration activities become.
European Patent Application No. 08154507.1 filed on 15 Apr. 2008 by the same Applicant discloses a workload scheduling system which is highly scalable to accommodate increasing workloads within a heterogeneous distributed computing environment. More particularly, the preferred embodiment employs a modified average consensus algorithm to evenly distribute network traffic and jobs amongst a plurality of computers. State information from each computer is propagated to the rest of the computers by the modified average consensus algorithm, thereby enabling the preferred embodiment to dispense with the need for a master server, by allowing the individual computers to themselves select jobs which optimally match a desired usage of their own resources to the resources required by the jobs. A drawback of the above method is that the user establishes a virtual network comprising a logical topology of the computers. In other words, it is the user's responsibility to select the right topology and this can bring to a wrong selection which may jeopardize the efficiency of the network.
It would be desirable to guide the user in this selection process or even better being able to count on a reliable method which determines the best solution according to predetermined parameters.
It is an object of the present invention to provide a technique which alleviates the above drawback of the prior art.