Supercomputers and large scale computer cluster systems are expensive to acquire, maintain, and operate, so it important to optimally utilize these resources. Job scheduling techniques on these expensive systems are designed to maximize utilization while maintaining a minimal waiting time for users to get access to the resources. These systems typically have many applications of different sizes that require different amounts of resources waiting to run in the queue. Research shows that the applications that require large amounts of resources also run longer on the system. Research also shows that the fraction of small and medium scale jobs is much larger than larger scale jobs. Thus, these small jobs tend to wait longer in the queues while a few larger jobs occupy the system for longer durations. Another property of the small jobs is that they are usually submitted by interactive users whose productivity is also a function of the completion time of these jobs. Accordingly, in these systems, timely completion of a single job is less important, but it is increasingly important to increase the job throughput.
In current large scale systems, backfill schedulers are used as a way to maximize resource utilization while preventing excessive delays in starting large jobs. In a system with a backfill scheduler, jobs are allocated resources according their priority in the queue. The highest priority job may not start immediately when some resources are available but not enough. The backfill technique calculates the earliest time into the future when all required resources will be available and then it attempts to backfill all other jobs that require the available resources and that finish before the earliest start time of the highest priority job. Backfill ensures that it will not delay the start of the highest priority job.