There is a parallel computing system that includes a plurality of nodes, which are physical computers, and an interconnected network that connects these nodes so as to communicate with each other. Various topologies, such as Mesh, Torus, Hypercube, or Tree, are usable for interconnected networks. An interconnected network connects a great number of nodes and allows the nodes to operate in parallel, thereby improving operating performance. A parallel computing system with high operating performance may be used for large-scale scientific computing, such as complex simulation.
When a job, which is a unit of execution, is submitted by a user, a parallel computing system allocates one or more nodes to the submitted job. The allocated nodes execute the program of the submitted job. In the case where a plurality of jobs are submitted, the parallel computing system performs job scheduling. The job scheduling is to allocate computing resources, which are specified by “node”×“time”, to the plurality of jobs, thereby determining which nodes and what time to execute each job.
In this connection, there has been proposed a computer system for managing the memories of a plurality of nodes to which processes may be assigned. In the proposed computer system, when a node receives a request for generating a process, the node selects a page to be removed from its memory if the memory has no free area. If it is possible to move the selected page from the memory to another node, the node sends the page to the other node over a network. If it is not possible to move the selected page to any other node, however, the node moves the page to an external storage device of the node.
In addition, there has been proposed a job scheduling system for allocating processors to jobs. In this proposed job scheduling system, general scheduling is first performed. If the scheduling indicates that any processor is going to be idle at a certain time, the job scheduling system causes the idle processor to start a job that has not started, earlier than scheduled. If all processors become busy before the job that has started earlier is completed, the job scheduling system interrupts the job that has started earlier until any processor becomes idle. In addition, if the initially scheduled start time of the job that has started earlier than scheduled comes before the job is completed, the job scheduling system causes the processor allocated to the job in the general scheduling to take over the job.
Further, there has been proposed a computer system in which a process is temporarily interrupted with “checkpoint restart”. The checkpoint restart is a technique that saves the state of a process running on a node, and later restarts the process from the saved state on the same node as or a different node from the node that executed the process before the interruption. The proposed computer system generates a node number conversion table indicating correspondences between logical node numbers and physical node numbers. When restarting the process, the computer system updates the node number conversion table, thereby making it possible to restart the job on a different node from before the interruption.
Please see, for example, Japanese Laid-open Patent Publication Nos. 06-187308, 2010-182199, and 2011-186606.
By the way, jobs need different numbers of nodes for execution and have different execution periods. Therefore, as a parallel computing system continues to operate, idle computing resources are fragmented in a resource space defined by “node”×“time”. Specifically, idle computing resources are discrete in the time domain, and even if a new job is submitted, any computing resource may not have a sufficient continuous idle period for the expected execution period of the job. Therefore, the fragmented idle computing resources are not used, which leads to a low efficiency in the use of nodes.
On the other hand, there is considered a method of allowing interruptions of jobs and using computing resources fragmented in the time domain. For example, a parallel computing system uses a node to start a job, and when a start time for another job with higher priority approaches, causes the node to save data stored in a memory to a save area, such as an auxiliary storage device of the node, and then to temporarily stop the job. After the other job is complete, the parallel computing system causes the node to load the saved data to the memory and then to restart the job. Alternatively, for example, the parallel computing system causes the node to transfer the data stored in the memory to another node that has become idle and then causes the other node to restart the job.
As described above, the efficiency of the use of nodes in a parallel computing system may be increased by performing job scheduling involving job interruptions. However, if a schedule plan involving transfer of data on a job between nodes is considered, a problem arises in the accuracy in the estimation of transfer period. In the case where there is a node located between a transfer source node and a transfer destination node, a transfer period may greatly vary due to other jobs running on the node and communication of the node for the other jobs. Therefore, if a transfer period is estimated on the basis of static information such as hardware performance, it is likely that there causes a big error between the estimated transfer period and the actual transfer period.
In addition, a low accuracy in the estimation of transfer period prevents optimal scheduling, which makes it difficult to achieve highly efficient use of nodes. For example, assume that a parallel computing system determines on the basis of an estimated data transfer period and idle periods of other nodes that a transfer destination node has a sufficient idle period for the execution of a job, and determines to change the node allocation to the job. Despite this, if the actual transfer period is longer than estimated and therefore the transfer destination node does not have a sufficient idle period for the execution of the job, a job interruption frequently occurs and thus an actual efficiency of the use of nodes decreases. If it is recognized in advance that a transfer period is long, a schedule may be determined so as not to cause a change in the node allocation to the job. That is to say, the accuracy in the estimation of transfer period in the job scheduling affects the efficiency of the use of nodes.