The present invention relates generally to the field of task management, and more particularly to resource allocation.
Most modern parallel job schedulers give static resource allocations to jobs, that is, a job is allocated the number of independent resource sets (herein called nodes) that it requested in its job script and uses these resources in a dedicated manner throughout its execution. The widely used backfill algorithm based on the principle of first-come, first-served (FCFS) does the following: (i) maintains jobs in the order of their arrival in the job queue and schedules them in order if possible; (ii) upon a job's completion (or arrival, if there are no jobs currently running and no jobs currently in the queue), dispatches jobs from the queue front and reserves resources for the first job (“queue top job”) in the queue that cannot be run due to insufficient resource availability; (iii) based on the user estimated wall times of the running jobs, calculate the backfill time window (user runtime estimates are inherently inaccurate such that there might be instances when backfill windows are left unpopulated due to runtime overestimation by users); and (iv) traverse the job queue and schedule jobs that can fit into the backfill window and whose execution will not interfere with the advance resource reservation of “queue top job” (such jobs should either complete before the reserved “queue top job” start time or occupy only resources that the advance reservation does not need to use).
Technical computing is increasingly oriented towards very large data sizes, with big data analytics emerging as a cutting edge technology. A large proportion of jobs in the big data analytics area are embarrassingly parallel (EP) jobs. In parallel computing, an embarrassingly parallel workload (or embarrassingly parallel problem) is one for which little or no effort is required to separate the problem into a number of parallel sub-tasks. This is often the case where no dependency (or communication) need exist between the parallel sub-tasks.
Some conventional schedulers employ a process in which the EP job is scheduled to start when a minimum number of resources are available. These methods that dynamically and adaptively schedule jobs in a way that aims to fill available resources optimally often employ the concept of “resizable jobs.” In such schedules, jobs can shrink or expand to accommodate the changing pattern of resource availability. For example, once an EP job is started, it runs continuously until the EP job is completed. During this time, the EP job can be: (i) expanded by dynamically assigning it more resources (“expanding”); and/or (ii) shrunk by dynamically taking away some resources (“shrinking”), for example if resources are required for other high priority jobs.