In the related art, there is a known technology for performing a simulation based on a numerical value computation by a parallel computer which has plural computation nodes. As an example of this technology, a parallel computer system is known which divides a computation space as a simulation object into multiple areas and executes a simulation for each of the divided areas with a different computation node.
The parallel computer system divides the computation space into the plural areas and regularly maps the divided areas to the plural computation nodes, respectively. That is, the parallel computer system maps each of the divided areas to the computation node having the same position relation as the position relation of each area. The parallel computer system causes the respective nodes to execute the simulations for the corresponding areas mapped to the respective nodes, thereby executing the simulation for the entire computation space.
In this case, when each computation node executes the simulation using a difference method, each computation node performs frequent communication with the computation nodes adjacent to itself. When executing a simulation where a correlation between the areas becomes stronger as the distance between the areas decreases, the communication amount increases when the distance between the communicating computation nodes decreases. For this reason, the parallel computer system efficiently executes the simulation using the plural computation nodes connected through a direct interconnection network based on the topology of multi-dimensional orthogonal coordinates.
FIG. 41 is a diagram for describing a network of computation nodes that are connected by a direct interconnection network based on meshed topology. In an example illustrated in FIG. 41, the computation nodes that are adjacent to each other among the plural computation nodes illustrated by circles are connected directly through a link. Since the computation nodes connected in the above-described way can perform communication with the adjacent computation nodes at a high speed, the simulation can be efficiently executed in the case of the execution of the simulation using the difference method or in the case where the areas adjacent to each other are correlated.
FIG. 42 is a diagram illustrating a network of computation nodes that are connected by a direct interconnection network based on torus (annular or toroidal) topology. In an example illustrated in FIG. 42, the computation nodes that are adjacent to each other among the plural computation nodes illustrated by circles are connected directly through a link, and the computation nodes that are positioned at both ends of the network are connected directly by the link. Since the computation nodes connected in the above-described way can perform communication at a higher speed than the computation nodes connected by the direct interconnection network based on the meshed topology even between the computation nodes of both ends, the simulation can be efficiently executed even in the case where a correction exists between both ends of the computation space, like the simulation using periodic boundary conditions. For each computation node, since a communication path between the computation nodes increases, the bisection bandwidth increases. As a result, the traffic between the computation nodes decreases.
However, according to the above technology for connecting the computation nodes by the direct interconnection network based on the topology of the multi-dimensional orthogonal coordinates, when the failed computation nodes are mixed on the network, each area is not mapped to the appropriate computation node.
That is, since the parallel computer system regularly maps each area divided by a program to each computation node, each area is not mapped to the computation node having the same position relation as the position relation of each area when the failed computation nodes are mixed on the network. In this case, each computation node does not efficiently perform communication with the computation nodes to which the areas adjacent to the area mapped to itself are mapped. As a result, since communication efficiency between the computation nodes may be deteriorated, the entire performance of the parallel computer system is deteriorated.
In addition, it is difficult to divide the torus network and to obtain the plural torus networks. For this reason, when the parallel computer system having the torus network executes a multi-job operation, it is difficult to execute each job using the torus network.
A technology that is discussed in the embodiments has been made in view of the above problems and maps each area to the computation node having an appropriate position relation.