Distributed computing over a heterogeneous collection of resources has garnered substantial interest from various industries including, for example, technical and scientific computing, financial, agriculture, and manufacturing. A core component in such an environment is a job scheduler, which schedules workload and assigns resources to the workload.
Large-scale compute farms include thousands of resources such as central processing units (“CPUs”) and software licenses. Thousands of users submit tens or hundreds of thousands of individual jobs that need to be scheduled and run. This vast collection of individual jobs constitutes the workload of the compute farm.
In a highly utilized compute farm, it is common for the workload's demand for resources to be larger than available capacity. Scheduling several thousands of jobs according to specified service level agreements (“SLAs”) and other policies, priorities and constraints is a problem that has been the subject of significant research. Various vendors and open source projects have developed products to address the scheduling problem.
As can be understood from FIG. 1, which is a diagrammatic depiction of a prior art system 100 for allocating compute farm resources 105, the system 100 employs a typical compute farm job scheduler 110. Scheduling compute farm resources 105 typically involves the scheduler 110 reviewing the workload 115 (i.e., the collection of pending individual jobs) and available compute farm resources 105 (i.e., CPUs, licenses, etc.), followed by allocating resources 125 such that individual jobs of the workload 115 are assigned to specific time periods 130 with specific resources 105. The allocating of resources 125 is based on the available resources 105 as well as scheduling policies 135, constraints 140, priorities 145, and SLAs 150. A set of such operations is repeated over several iterations. At the beginning of each iteration, new jobs and resources as well as the updated status from existing jobs and resources are collected.
In a large-scale environment, two aspects of scheduling are important. First, all SLAs, policies, priorities and constraints must be met in order for the scheduler to correctly perform its job. Second, scheduling needs to be quick in order to keep the utilization high. A large-scale compute farm has a higher overall job completion rate than a smaller farm for the same types of jobs. Longer scheduling time will result in a higher number of resources being left idle while the next scheduling iteration completes.
The importance of scheduling speed is evidenced by the following example. Consider a 5,000 CPU compute farm catering to electronic design automation (“EDA”) jobs submitted by 1000 microprocessor designers from 20 distinct projects. In such an environment, a large percentage of the jobs typically run for 20-30 minutes. Assuming 30 minutes as an average runtime, there would be 10,000 jobs completed via the 5,000 CPUs in an hour. On average, therefore, there would be about 166 CPUs made idle in a minute. This implies the farm would be leaving about 166 CPUs, or 3.33% of the compute farm capacity, idle if a scheduling iteration takes 60 seconds. If the scheduling iterations take 5 minutes, then about 833 CPUs, or 16.66% of the compute farm capacity, would be left idle constantly. A similar percentage of licenses would also be left unused as can be seen by repeating the calculations for licenses instead of CPUs. The result is inefficiency due to idle resources. More resources, for example, CPUs and licenses, would have to be purchased to perform the same amount of work. Such resources are expensive, and inefficient utilization of resources has a negative impact on time to market, amongst other negative imports.
Long scheduling iterations present other problems besides inefficient utilization of compute farm resources. For example, long scheduling iterations result in newly submitted jobs having to wait until the start of the next iteration to be scheduled. Long iteration times negatively impact jobs with a high priority or of an interactive nature that need to be scheduled and run immediately.
Scheduling a large number of jobs over a large number of CPUs and licenses is a time consuming operation. A state of the art scheduler performs numerous operations and evaluates several scheduling scenarios within each iteration. Several factors can increase the time to complete one scheduling iteration. These factors include: the number of jobs waiting for resource allocation; the total number of CPUs; the number of idle CPUs on which to schedule jobs; the number of licenses available for jobs to be scheduled; the variety of the job mix (i.e., the number of distinct types of jobs); and the number and type of scheduling constraints. The number and type of scheduling constraints can be share tree based project allocation, limits on resource usage per user, project, job type, etc., or time and/or data based dependencies.
A large scale, highly utilized farm exacerbates the problems associated with the efficient use of compute farm resources by increasing the numbers for all of the above-listed factors. The faster the scheduling iteration, the easier it gets to utilize all of the CPUs and licenses that become idle. A slower scheduling iteration can make every subsequent iteration longer due to an increasing number of idle CPUs. This situation can spiral itself and result in very low utilization of the farm. Benefits of farm based computing then disappear.
A significant amount of research has focused on the sophistication as well as accuracy of job scheduling algorithms for compute farms and parallel job environments. Algorithms and techniques have been proposed to achieve optimizations in resource utilization, but have not directly addressed the effect of scheduling iteration time on resource utilization. Two relevant examples are included below.
In a 2004 IEEE workshop, a two stage static-dynamic optimization of job scheduling and assignment of resources was proposed. Such job scheduling employs a technique that achieves sophisticated scheduling of jobs by combining complex algorithms including advance reservation as well as back filling. Each scheduling iteration evaluates several critical job attributes to calculate global priorities that are automatically normalized. High utilization of resources was achieved in each scheduling iteration as shown by a sample scenario with complex requirements. However, the speed of the scheduling algorithm was not addressed. In a large scale compute farm, the amount of calculations that would need to be performed would rise dramatically, thereby leaving more resources idle while the next set of schedules are determined. For greater detail regarding the proposed two stage optimization, see Lev Markov, “Two Stage Optimization of Job Scheduling and Assignment in Heterogeneous Compute Farms,” Proceedings of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems (FTDCS '04) 2004. This reference is incorporated by reference in its entirety into the present application.
In another workshop in 1999, a strategy was proposed for designing a job scheduling system. The scheduling system included three critical parts, which were scheduling policy, objective function, and a scheduling algorithm. The policy captures resource allocation rules, generally defined by resource owners and/or administrators. The objective function captures a measure of the adequacy of the system-generated schedules. The scheduling algorithm generates valid schedules for the jobs over the available resources. However, the proposed strategy fails to address the impact of the speed of the algorithm on resource utilization. Generating good schedules was the primary focus. For greater detail regarding the proposed strategy, see J. Krallman, U. Schwiegelshohn, R. Yahyapur, “On the Design and Evaluation of Job Scheduling Algorithms,” 5th Workshop on Job Scheduling Strategies for Parallel Processing, pp. 17-42, 1999. This reference is incorporated by reference in its entirety into the present application.
There is a need in the art for an apparatus and system for rapid resource scheduling in a compute farm. There is also a need in the art for a method for rapid resource scheduling in a compute farm.