In a large parallel computer system, it is difficult to use a network that is a crossbar interconnection network or the like and has a configuration in which “the performance of communication between multiple processors is hardly affected by the arrangement of the processors on the network”, because of a problem with a cost associated with increases in the quantities of wirings and relay devices. This is due to the fact that the quantities of wirings and relay devices are proportional to the square of the number of processors in the network having the configuration. Thus, a network that is a mesh network, a torus network, or the like and has a connection configuration (or network topology) in which the quantities of wirings and relay devices are suppressed and approximately proportional to the number of processors serving as arithmetic processing devices is used in many cases.
Techniques for network topologies of large systems are currently developing. The quantities of wirings and relay devices are requested to be suppressed and approximately proportional to the number of processors as basic characteristics, similarly to the large parallel computer system. This, however, is not limited to mesh and torus networks.
In such a large system, a set of processors or a pair of the set of processors and an available time zone is assigned to each of multiple jobs (the maximum number of jobs is in a range from several thousands to several tens of thousands or is significantly large in many cases, depending on the size of the system), and the jobs are simultaneously executed in general. In this case, it is desirable that a set of processors that are assigned to jobs be arranged on a network without interference of communication traffic between the set of the processors assigned to the jobs and another set of processors assigned to other jobs. For example, if the network is a mesh network or a torus network, a method of assigning the set of the processors assigned to the jobs to a “mesh or torus (or sub-mesh or sub-torus) smaller than the network” is used in many cases.
The set of the processors to be assigned to the jobs varies depending on the number of elements, positional relationships between the processors on the network, and the like. Periods of time when the jobs are executed vary. Thus, in a large system in which positional and chronological relationships between available resources and assigned resources are likely to be complex, a process of managing resources for jobs is likely to be a bottleneck for a scheduling process. As a result, the process of managing resources may cause a reduction in the rate of using the system due to a reduction in the performance of a job scheduler or cause a reduction in the throughput of the system.
In order to reduce a process time for scheduling, it is considered to execute a process of searching, in parallel, resources to which a job is able to be assigned or divide, into multiple threads, a search range in which a set of processors able to be assigned is searched, for example. However, when the search process is executed on resources in parallel, the following problems may occur.
Specifically, an available resource that is actually able to be assigned may not be detected depending on the method of allotting the search range or a procedure for the search process. For example, for a conventional technique for using bitmap to manage available resources and assigned resources in a mesh or torus network, a method of shifting positions to be searched at intervals corresponding to shape parameters (or sizes in dimensions) of available resources to be searched and a method of shifting, in dimensions by one cell, positions to be searched are known.
In the former one of the two methods, an available resource may be overlooked. The latter method has a problem of a long search time.
In the process executed in parallel, ranges to be searched in parallel and allotted to processing elements (processors, processor sets, cores of processors, sets of cores of processors, or the like) are not appropriate, and an available resource able to be assigned may fail to be searched.
In the process executed in parallel, if the ratio of a period of time for executing the process while the process is not executed in parallel is large, the efficiency of reducing a process time due to the parallelization is reduced (Amdahl's law). Thus, it is preferable that the ratio of a period of time for executing the process in parallel to the total period of time for executing the search process be high. The conventional technique, however, is not devised in consideration of the aforementioned fact.
In the process executed in parallel, periods of time when the search process is executed on search ranges may vary depending on the search ranges, and scheduling performance may be limited due to the longest process time.
It is considered that a period of time for executing the search process that includes determination of whether or not assignment is possible depends on “the complexity of the assignment” or “the degree of progress of fragmentation of a resource region”. However, a method of quantifying “the complexity of the assignment” of resources managed by a job scheduler or “the degree of progress of fragmentation” of the resources managed by the job scheduler is not established and the conventional technique does not solve the problems of quantifying “the complexity of the assignment” or “the degree of progress of fragmentation”.
For example, the quantification of fragmentation in a memory region or disk region is known as a conventional technique and focuses attention on a single parameter that indicates “whether or not page numbers or block numbers of individual assigned regions (memory segments or files) or available regions are contiguous”. Thus, the determination of whether or not the fragmentation exists is relatively simple, but it is considered that multiple parameters related to “connection relationships of processors within a network” affect “the degree of progress of the fragmentation” of regions for resources to be managed by the job scheduler. Thus, a method that is the same as or similar to a method using a memory or disk is not used.
Examples of related art are Japanese Laid-open Patent Publications Nos. 2010-204880, 2009-070264, and 2008-71294 and International Publication Pamphlet WO 2005/116832.
Thus, according to an aspect, an object of the disclosure is to provide a technique for improving the efficiency of the parallelization of a search process by a job scheduler executed by a job managing device configured to manage a computer system including a plurality of computers.