In parallel computer systems that are information processing systems, in particular, in High Performance Computing (HPC) systems, systems that include over 100,000 nodes are developed to improve the performance. The nodes mentioned here are the unit of processing units that execute information processing and are, for example, central processing units (CPU), such as arithmetic processing units.
If the number of nodes becomes enormous, in an indirect network in which communication is performed via a typical crossbar switch as a communication between nodes, the amount of resources of switches becomes enormous, which is impractical. Accordingly, if the number of nodes becomes enormous, a direct network in which adjacent nodes are directly connected is often used as an interconnection. In a direct network, communication between nodes that are not adjacent each other is performed via an interconnect router that is implemented on intermediate nodes between both nodes that communicate with each other. A mesh type or a torus type is often used for a network topology of the direct network, i.e., the structure of a network.
In a parallel computer system that uses the direct network, if a node that executes a parallel job-B is present between nodes that execute a parallel job-A (hereinafter, “parallel job” is simply referred to as “job”), communication between the nodes that performs these jobs affect each other, resulting in the reduction of the performance of the communication. Specifically, this state eventually results in an increase in the time taken to execute both jobs. Accordingly, to improve the job execution performance, a parallel job is placed in an adjacent node group. Furthermore, a parallel application is designed such that, in calculation, the number of times of communication between adjacent nodes is greater than that performed to another node.
In the parallel computer system, that has mesh type or torus type interconnect (hereinafter, simply referred to as a “parallel computer system”), the communication efficiency is improved when the network topology matches the communication structure of the application, and, furthermore, the job execution performance is increased. For example, in a parallel computer system, to execute a parallel application that needs a large number of adjacent communications for a two-dimensional data array, the performance is improved when the parallel application is executed in a two-dimensionally arrayed node group compared with a case in which the parallel application is executed in a one-dimensionally arrayed node group. Consequently, in parallel computer system in general, when a user submits a job, the user specifies a shape of node group suitable for the job. In the parallel computer system, a job scheduler determines which job is executed by which node. This management work is called “job placement” The job scheduler does not simply determine the job placement in accordance with the number of nodes but performs job placement calculation by taking into consideration a job shape.
In a description below, the placement shape of nodes suitable for a job is referred to as a “job shape”. Furthermore, job shapes with different sizes are treated as different job shapes. For example, even if job shapes are square, it is determined that the job shape of 3×3 nodes and the job shape of 5×5 nodes are different job shapes. Furthermore, if the communication performance in each dimensional direction is a network is the same, the same execution performance can be obtained even if the job shape is rotated. Consequently, job shapes with the same shape when the job shapes are rotated are treated as the same job shapes. For example, the job shape of 2×3 nodes and the job shape of 3×2 nodes are treated as the same job shape.
There is a conventional method used, as a job placement method, in a conventional parallel computer system that determines, when a certain job is placed, placement of the job by taking into consideration the subsequent jobs that are submitted in the future can be arranged as many as possible. For example, there is a job placement method that determines the placement such that the number of nodes in adjacent unused nodes after the job placement becomes the maximum. In this case, for example, if multiple candidates of placement in which the number of nodes in unused node groups is the maximum are present, the placement is determined such that, first, a priority is given to the number of adjacent unused nodes in the downward direction and then, a priority is given to the number of adjacent unused nodes in the right direction.
If a large number of jobs with various job shapes are submitted in a parallel computer system, there may be small sized unused node groups. This situation is called “fragmentation”.
There is a conventional technology that calculates the utilization rate per program on the basis of the time occupied by a program, the execution frequency, and the utilization rate of a computing node and then allocates programs to multiple computers in the order the size of the programs is large.
Patent Document 1: Japanese Laid-open Patent Publication No. 2011-243112
If jobs are placed such that the number of unused nodes is simply increased, when a job is placed afterwards, even it the total number of the unused nodes meets the number of nodes requested by the job, there may be a case in which jobs are not able to be placed due to different job shapes. In such a case, the job that is not able to be placed becomes in a waiting state and thus the remaining nodes that are not able to be job placed are unused.
In an actual system, when jobs are queued in job queue, even if the top of waiting job in the job queue is not executed due to the situation described above, a subsequent job in the execution queue may possibly be executed. Determining whether to execute the job by using this method, which is usually referred to as “backfill”, depends on a system management policy and varies for each parallel computer system. Furthermore, even if the policy permits execution of the subsequent job in advance, small size fragmentation may be further occur because the structure of an unused node does not match the job shape of the job that can be executed. Consequently, it is inevitable that an unused node is generated. In this way, the utilization of the nodes is decreased due to fragmentation.