A Massively Parallel Processor can include a large number (even thousands) of central processing units possibly grouped into nodes. A well known massively parallel processor is the BlueGene/L™ of International Business Machines™ of Armonk, N.Y.
A massively parallel processor is expected to execute multiple computer tasks (jobs) in parallel by using multiple processing units. Usually, multiple users can send requests to execute various jobs. These requests are queued in multiple queues and a scheduler selects (i) which head-of-queue job to execute (job selection), and (ii) which resources shall participate in the execution of the job (resource allocation).
These two decisions are traditionally executed by using mutually independent algorithms. Typically the job selection is responsive to various priorities while the resource allocation is responsive to the topology of the massively parallel processor.
The scheduler, and especially the job selection, can be responsive to various parameters including user priority, group of user priority, job priority, time of arrival of the request to the job, and the like. The performance of the massively parallel processor is largely dependent upon the efficiency of the scheduler. A badly designed scheduler can use only a fraction of the massively parallel processor resources, and can otherwise utilize the massively parallel processor in a non-efficient (also time-wise) manner.
Various algorithms were developed for job selection. One prior art method for job selection is known as backfilling. The following papers, all being incorporated herein by reference, illustrates some backfilling methods: A. W. Mualem and D. G. Feitelson, “Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling” IEEE Trans. Parallel and Distributed Syst. 12(6), pp. 529-543, 2001; J. Skovira, W. Chan, H. Zhou, and D. Lifka, “The EASY LoadLeveler API Project”. JSSPP 1996 pp. 41-47; D. Lifka, “The ANL/IBM SP Scheduling System”, JSSPP 1995 pp. 295-303; and Edi Shmueli, Dror G. Feitelson “Backfilling with Lookahead to Optimize the Performance of Parallel Job Scheduling” JSSPP 2003 pp. 228-251.
There are various resource allocation method that are usually used when the topology of the massively parallel processor are not trivial, some being illustrated in the following papers, all being incorporated herein by reference: “An Efficient Task Allocation Scheme for 2D Mesh Architectures”, S. Yoo et al., IEEE Trans. on Parallel and Distributed systems, v. 8(9), pp. 934-942, 1997; “A Fast and Efficient Strategy for Submesh Allocation in Mesh-Connected Parallel Computers”, D. Das Sharma and D. K. Pradhan, IEEE Symp. parallel and Distributed Processing, pp. 682-689, 1993; “Submesh Allocation in Mesh Multicomputers Using Busy List: A Best-Fit Approach with Complete Recognition Capability”, D. Das Sharma and D. K. Pradhan, Journal of Parallel and Distributed Computing, v. 36, pp. 106-118, 1996; “Job Scheduling in Mesh Multicomputers”, D. Das Sharma and D. K. Pradhan, IEEE Trans. in Parallel and Distributed Systems, v. 9(1), pp. 57-70, 1998; “A Submesh Allocation Scheme for Mesh-Connected Multiprocessor Systems”, T. Liu et al., Proc. 1995 Int'l Conf. Parallel Processing, v. 2, pp. 159-163, 1995; “On Submesh Allocation for Mesh Multicomputers: A Best-Fit Allocation and a Virtual Submesh Allocation for Faulty Meshes”, G. Kim and H. Yoon, IEEE Trans. on Parallel and Distributed Systems, v. 9(2), 1998; “Job Scheduling for the BlueGene/L System”, E. Krevat, J. G. Castanos, and J. E. Moreira, Job Scheduling Strategies for Parallel Processing workshop, Lecture Notes in Computer Science v. 2537, pp. 38-54, Springer-Verlag, 2002; and “Multi-Toroidal Interconnects: Using Additional Communication Links to Improve Utilization of Parallel Computers”, Y. Aridor et al., JSSPP 2004 pp. 72-88.
These resource allocation algorithms can be categorized into two groups named “First-Fit” and “Best-Fit”. The basic difference between the various resource allocation algorithms is in the way, or the order, the resources of the massively parallel processor are scanned, and in the heuristics used to decide the “best”.
There is a need to provide efficient systems, methods and computer program product for job selection and resource allocation of a massively parallel processor.