Hitherto, jobs (for example, computations, several days being taken to complete the computations) that are associated with science and technologies or the like and that place high computing loads have been executed utilizing parallel supercomputers disposed in computer centers or the like, computer clusters, or the like (hereinafter, referred to as a “computational resource”). Typically, such a computational resource is shared with a plurality of users (for example, tens of researchers). Accordingly, in order to assign the limited computational resource to the individual users with fairness, it is necessary to appropriately schedule execution of jobs that are submitted by the individual users.
Such execution of jobs is typically executed as follows: (1) users submit jobs in a job queue of a job scheduler; (2) the job scheduler executes a scheduling process of dispatching the jobs on the basis of the order of priority levels of the individual jobs, and this process is executed until no job is left in the job queue or until no computational resource that can be assigned is left; and (3) when a trigger with which a job in the job queue can be dispatched is generated by submitting a new job with a user, by termination of a job, or the like, the job scheduler executes the scheduling process.
Here, when jobs are submitted, basically, priority levels of the individual jobs are determined on the basis of an submitting order, which is the order in which the jobs are submitted, or on the basis of priority levels of users who own the jobs. Note that priority levels of users are priority levels that are determined among the users, and are different from priority levels of jobs. Priority levels of jobs are priority levels that are determined among jobs that belong to the same user.
However, when a dispatch order is determined only on the basis of priority levels of jobs that have been determined when the jobs have been submitted, a phenomenon occurs, in which a computational resource is occupied by a user for which a high priority level is set or by a large amount of jobs that have previously been submitted by a certain user. In order to prevent a computational resource from being occupied by such one user, a typical job scheduler has a fair sharing function.
For example, fairness in assignment of a computational resource is maintained by dynamically changing priority levels of users as illustrated in FIG. 1.
The priority levels of users that dynamically change are referred to as “dynamic priority levels”. For each user, a static priority level is set in accordance with the degree of importance of the user. The static priority level is determined on the basis of an utilizable capacity, which is determined for each user, (for example, which is represented by an expression of the number of central processing units (CPUs)×a time) of a computational resource per fixed time period (for example, one year).
When a job is dispatched and execution of the job with a computational resource starts (at a time at which the job starts), a state in which a penalty is imposed is set for a user associated with the job. In other words, as illustrated in FIG. 1, in response to the start of execution of the job, the dynamic priority level of the user associated with the job decreases by an estimated utilization capacity (for example, which is represented by an expression of the number of utilized CPUs×a fixed time (an estimated execution time in the embodiments), hereinafter, simply referred to as an “estimated utilization capacity”) of the computational resource for the job. In the state in which a penalty is imposed, the dynamic priority level of the user decreases relative to the priority levels of the other users, and the place of the job in a dispatch order, which is the order in which jobs are dispatched, is moved down the order. Note that the estimated utilization capacity is set by the user when the job is submitted. The estimated execution time is a time taken to terminate the job. The number of utilized CPUs is the number of CPUs that the user utilizes.
After a penalty is imposed (after the dynamic priority level is reduced) in response to the start of execution of the job, the dynamic priority level of the user associated with the job gradually recovers with time using a recovery rate that is determined in accordance with the utilizable capacity of the computational resource per fixed time period. Accordingly, the number of penalties can be reduced at an earlier time for a user to which a larger utilizable capacity of the computational resource per fixed time period is set. Referring to FIG. 1, the dynamic priority level linearly increases in a period from a time at which a job starts to a time at which a job finishes. Note that the maximum value of the dynamic priority level is the static priority level. Accordingly, after the dynamic priority level has reached the static priority level, the dynamic priority level does not further increase.
In the scheme illustrated in FIG. 1, a utilizable capacity of a computational resource per fixed time period is set for each user, and, regarding a job associated with a user who has utilized the computational resource so that the utilized capacity of the computational resource exceeds the utilizable capacity, the place of the job in a dispatch order is moved down the order (that is, even when the fixed time period has elapsed, the dynamic priority level of the user cannot reach the static priority level although the dynamic priority level of the user is recovered). Accordingly, the fair sharing function can be realized (dynamic priority levels can be adjusted) so that the computational resource can be utilized with fairness within a range in which individual users are allowed to utilize the computational resource.
However, in an environment in which a scheme for controlling dynamic priority levels such as the scheme illustrated in FIG. 1 is realized, a case is supposed, in which a large amount of job associated with a certain user (hereinafter, referred to as a “user A”) is submitted in a state in which a large amount of free capacity of a computational resource (free capacity of a CPU) exists. In this case, because no jobs associated with the other users exist, there is a possibility that the large amount of job associated with the user A is executed even in a state in which a large number of penalties are impose on the user A.
As a result, the dynamic priority level of the user A changes as illustrated below. FIG. 2 is a graph illustrating an example of changes in dynamic priority levels in a case in which a large amount of job associated with a specific user is submitted.
Referring to FIG. 2, the dynamic priority level of the user A decreases at times a, b, c, d, e, and f. The reason for this is that jobs associated with the user A are dispatched at the individual times. As a result, the dynamic priority level of the user A markedly decreases, and a state in which the dynamic priority level of the user A will not recover for a while is illustrated. Accordingly, when jobs are submitted by other users, there is a possibility that a state in which the places of the jobs associated with the other users jump over the place of a job associated with the user A in a dispatch order continues for a long time.
However, submitting of a job in a state in which a free capacity of a computational resource exists leads to effective utilization of the computational resource. Thus, it can be considered that imposing of a penalty such as the above-described penalty on a user who has utilized a computational resource in an ideal manner is terrible. Accordingly, it can be considered that it is appropriate to provide some remedial means for the user.