In recent years, supercomputers serving as information processing apparatuses that perform scientific and technical computation have, for example, as may as tens of thousands of computing nodes. Each computing node is connected to a mesh network. An example of a type of mesh network is a mesh network formed as an assembly of a plurality of building blocks, which constitutes the whole mesh network. Therefore, the mesh network has a characteristic in which it can be extended in building block units, i.e., the mesh network has a characteristic of high extensibility.
With supercomputers, to prevent communication interference between jobs, a system is divided into rectangular or cuboid partial regions that constitute part of the mesh network (hereinafter, “submesh”), and the divided systems execute the jobs that are allocated to them. However, job allocation to submeshes causes fragmentation in the systems, whereby there are submeshes to which jobs cannot be allocated. This may causes a reduction in activity the ratio in the system.
In the field of scientific and technical computation (also called a High Performance Computing (HPC) field) in which supercomputers operate, the effect of fragmentation on submeshes is particularly serious because the supercomputers are continuously operating while executing various jobs. Accordingly, to alleviate the fragmentation, there is a technology for backfilling performed in job scheduling.
In job scheduling, jobs are controlled and executed. And, there is a method in which small scale jobs or jobs given low priority from the beginning, which are waiting for a long time because a large scale job came in first or a job given high priority from the beginning is executed, are executed before executing large scale jobs by raising the priority of such low priority jobs. Here, the term “large scale” means that the “processing time is relatively long”, whereas the term “small scale” means that the “processing time is relatively short”. This scheduling method is called backfill (Backfilling).
For the backfilling, two algorithms, i.e., conservative and aggressive, are proposed. For example, aggressive backfilling ensures an execution start time only for the highest priority queued job; whereas conservative backfilling ensures an execution start time for all queued jobs. Because of this, the conservative backfilling can avoid a starvation phenomenon, in which a specific job is never executed, and can submit an execution start time for queued jobs to users, which are advantages. However, when compared with the aggressive backfilling, the conservative backfilling needs a large amount of computational effort, and thus disadvantageously affects scheduling performance.
Furthermore, with the aggressive backfilling, because no operation is performed on queued jobs that cannot be started, the amount of computational effort to perform the aggressive backfilling is proportional to the number of queued jobs. In contrast, the conservative backfilling reserves computational resources (hardware resources) for future use. The reservation of computational resources performed by the conservative backfilling has two events, i.e., resource acquisition and resource release. The events are listed in order of occurrence and managed by a list called an event list.
With an algorithm for the conservative backfilling, scheduling is performed by scanning an event list and searching for a period of time during which a computational resource (hardware resource) that is needed by a job can be ensured for the necessary period of time. Accordingly, the amount of computational effort to perform the conservative backfilling is proportional to the square of the number of queued jobs. With the aggressive and the conservative backfilling, it is determined, in order of high priority, whether queued jobs are available for execution.
A large amount of computational effort is needed for submesh allocation, and furthermore, further a large amount of computational effort is needed when submesh allocation and backfilling are both performed at the same time. Accordingly, with the conventional technology, it is difficult to develop conservative backfilling for practical use that takes into consideration a mesh topology; therefore, aggressive backfilling or a simpler method is used. An example of the simpler method is a technology in which small jobs executed in a short time are moved ahead first for execution without taking into consideration priority and without making a reservation for submeshes.    [Patent Document 1] Japanese Laid-open Patent Publication No. 2005-310139    [Non-patent Document 1] Y. Zhu, “Efficient Processor Allocation Strategies for Mesh-Connected Parallel Computers”, Journal of Parallel and Distributed Computing, vol. 16, issue 4, pp. 328-337, December 1992.    [Non-patent Document 2] Lifka, D. A. “The ANL/IBM SP Scheduling System” In Proceedings of the Workshop on Job Scheduling Strategies For Parallel Processing D. G. Feitelson and L. Rudolph, Eds. Lecture Notes In Computer Science, vol. 949. Springer-Verlag, London, pp. 295-303, 1995.    [Non-patent Document 3] Mu'alem, A. W. and Feitelson, D. G. 2001. “Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling” IEEE Trans. Parallel Distrib. Syst. Vol. 12, No. 6, pp. 529-543, June 2001.
However, with the technology described above, there is a problem in that the activity ratio of the system that executes jobs is reduced. Specifically, with the technology in which small jobs are moved ahead first without taking into consideration the priority, because starvation may occur for large jobs that take a long time for execution, a policy control needs to be used in which delayed jobs are given increased priority. At this time, because prioritization is performed on the delayed jobs, it is impossible to avoid a delay in starting the execution of large jobs. Furthermore, moving ahead large jobs, which are given higher priority, using backfilling is eventually impossible. As a result, with the technology described above, the activity ratio of the system that executes jobs is reduced.