Conventionally, high work load jobs related to science and technology and the like (for example, calculations requiring a few days to complete) are processed using, for example, parallel supercomputers and computer clusters (hereinafter referred to as “computational resources”) at computer centers and similar facilities. In general, such computational resources are shared by multiple users (for instance, several tens of researchers). Accordingly, in order to fairly allocate limited computational resources to all the users, execution scheduling for jobs submitted by users has to be performed in an appropriate manner.
Execution of jobs in such a case is generally performed as follows.
(1) Users submit jobs to a job queue of a job scheduler.
(2) The job scheduler dispatches jobs in the job queue to the computational resources in order of priority of the jobs. This process is carried out until there are no more jobs in the job queue or until there are no more allocatable computational resources.
(3) If, for example, due to a new job submission by a user or job completion, a trigger occurs which allows a job in the job queue to be dispatched, the scheduling process of (2) above is performed.
Priorities of individual jobs (job priorities) are basically determined at each time when a new job is submitted, based on the order of job submissions or user's priorities associated with the jobs. Note that the user's priorities (hereinafter referred to as “user priorities”) are used as an index to define the order of priority among the users, and are different from the job priorities. However, if the dispatch order is determined only based on the job priorities defined each time a new job is submitted, the computational resources might be monopolized by a user with a higher priority or by a large number of jobs submitted earlier by one user. In order to prevent such exclusive use of the computational resources by a single user, conventional job schedulers have a fair-share scheduling function. As for a conventional fair-share scheduling function, if the amount of computational resources currently used by a user increases or the execution time of a job currently being executed becomes long, adjustments are made by lowering the user priority of this user so that the user priorities of other users increase in comparison.
A first example of conventional fair-share scheduling uses the following equation (1). The term “priority” used in this example refers to the user priority.Pd(t)=Ps/(1+F(t))  (1)
where t is the current time; Pd(t), a dynamic priority of a user concerned; Ps, a static priority of the user; and F(t), a function that increases as the elapsed time from the start of a job currently in execution (current job) becomes longer and, therefore, the amount of computational resources used by the current job associated with the user increases (this function becomes 0 when the current job is completed).
FIG. 1 is a simplified representation showing transition of the dynamic priority according to the first example of the conventional fair-share scheduling.
As shown in FIG. 1, according to Equation (1), the dynamic priority Pd(t) decreases with the lapse of the job execution time (that is, it decreases as the utilization amount of computational resources increases). Accordingly, fairness among multiple users can be secured in terms of utilization amounts of computational resources and user priorities. However, according to Equation (1), the dynamic priority Pd(t) is set back to the static priority Ps immediately after the job completion. This possibly allows jobs of the same user to continuously use the computational resources. In addition, since the dynamic priority Pd(t) is high immediately after the start of the job execution, a large number of jobs of a single user may be dispatched at once. In order to avoid the occurrence of such situations, the following equation (2) is used which is formed by incorporating terms pertaining to an execution time period of jobs executed in the past (past jobs) and a commitment time period into Equation (1).Pd(t)=Ps/(1+F(t)+Ch×(Th(t)+Tr(t))+Cc×(Tc(t)−Tr(t)))  (2)
where Ch is a coefficient of the execution time period of past jobs; Cc, a coefficient of the commitment time period; Th(t), the total execution time period of the past jobs (note that the execution time period of each past job is multiplied by an attenuation coefficient so as to decrease over time); Tr(t), the total execution time period of the current job; and Tc(t), the total commitment time period (expected execution time period) of the current job.
FIG. 2 is a simplified representation showing transition of the dynamic priority according to the second example of the conventional fair-share scheduling.
“Ch×(Th(t)+Tr(t))” in Equation (2) corresponds to the term of the past job execution time period. The term of the past job execution time period is provided in order to prevent a user who has used a large amount of computational resources in the past from continuously using the computational resources after the completion of the current job. That is, by using the term of the past job execution time period, the dynamic priority Pd(t) is made to decrease according to the execution time period of past jobs. An attenuation coefficient is applied to the execution time period of each past job so that the degree of contribution of the execution time period of the past job decreases with time. Note that, in FIG. 2, a curved line after the job completion represents the effect of the term of the past job execution time period. Specifically, according to the scheduling of FIG. 1 (Equation (1)), the dynamic priority Pd(t) is restored to the static priority Ps immediately after the completion of the current job; however, according to the scheduling of FIG. 2 (Equation (2)), the dynamic priority Pd(t) is gradually restored after the completion of the current job. Herewith, the priority of the user who has used a large amount of computational resources is kept low for a while, thereby preventing continuous job execution by one user.
“Cc×(Tc(t)−Tr(t))” in Equation (2) corresponds to the term of the commitment execution time period. The term of the commitment execution time period is provided in order to prevent a large number of jobs of a single user from being dispatched at once. That is, by using the term of the commitment execution time period, a larger reduction in the dynamic priority Pd(t) is made if a larger value is obtained by subtracting the elapsed time from the start of the current job execution from the commitment time period (an expected job execution time period reported by the user at the time of the job submission)—i.e. the larger is the expected remaining execution time period of the current job. Herewith, the dynamic priority Pd(t) is made to decrease immediately after the start of the current job execution, thereby preventing a large number of jobs of a single user from being dispatched at once.
Patent Document 1: Japanese Laid-open Patent Application Publication No. 2006-48275
Patent Document 2: Japanese Laid-open Patent Application Publication No. H07-253893
However, the scheduling of Equation (2) leaves the problem that, although it requires appropriate values to be assigned to the respective parameters (e.g. Ch, Cc and the attenuation coefficient), it is difficult to do so since the parameters involved are large in number. That is, in general operational environments of job schedulers, amounts of computational resources available to individual users in a given time frame have been specified. Accordingly, it is preferable that dynamic priorities Pd(t) of individual users be adjusted by the fair-share scheduling function in such a manner that these users are able to use the amounts of computational resources individually specified for them. Specifically, adjustments should be made such that the dynamic priority Pd(t) of a first user, whose job in execution is using a larger amount of computational resources than the amount specified for the first user, decreases in comparison with the dynamic priority Pd(t) of a second user, whose job in execution is using only a fraction of the amount of computational resources specified for the second user. In this way, it is possible to preferentially allow the second user to use the computational resources.
However, with the scheduling of Equation (2), it is very difficult to assign appropriate values to the parameters so as to achieve the above-described adjustments. In order to cope with this problem, an additional function is conventionally provided besides the fair-share scheduling function. The additional function serves to keep on record computational resource amounts used by individual users and control the job execution of a user if the recorded computational resource amount of the user exceeds a limited amount allowed for the user.