The present invention relates to a method of scheduling a job and a device therefor so as to enhance a load balance between respective clusters in a clustered computer system.
Recently, parallel processors have been increasingly used for the structure of computer systems. Even the general-purpose computers generally have a clustered structure in which a plurality of processor groups that share a main memory are coupled to a shared memory (i.e., global memory). Each of the processor groups that share the main memory in that structure is called a "cluster".
In a clustered computer system, a load balance between the clusters is required to achieve a satisfactory system performance. For a tightly-coupled multi-processor system, an internal load share between the processors is automatically maintained at a nearly optimum level. This is because a queue of processes that wait for a processor is held in the shared main memory, and an idle processor immediately takes a process to be executed. Typically, the individual processes release the processor every several milliseconds (ms) for other jobs, and the queuing-up operation is repeated for again ensuring system productivity.
However, in a clustered computer system, particularly in a batch processing system, moving a job that starts to be executed in a cluster to another cluster creates a large overhead. Therefore, a unit of assigning a load to a cluster must be a job that requires several minutes or several tens of minutes of processing time, to make such an assignment feasible. Several jobs or several tens of jobs are running on each cluster simultaneously. This group of jobs is the work-load at the time. The work-load is required to be balanced between the respective clusters. However, the characteristics of the individual jobs that wait for execution (e.g., the length of a processing time, the load ratio of processors, etc.) are unknown in advance.
Keeping the utilization of all clusters to nearly 100% is relatively easy if the capacity of the main memory is sufficient, since a sufficiently large number of jobs may be executed by all clusters. However, some on-line processes are often processed in the same system, or some batch jobs having a processing priority are processed concurrently with the batch jobs. In these cases, a "nearly 100% policy" is detrimental to high priority tasks. Thus, the method of scheduling the batch job for the respective cluster suffers from a very serious and difficult problem in a clustered computer system.