A common large-scale parallel computer system uses a communication network being virtually divided in a plurality of parts so as not to hardly interfere with one another.
One of the reasons is to prevent performance of each job from being degraded by communication interference between jobs when single job performs a parallel processing by using a plurality of processors and a plurality of jobs runs on a system. Therefore, each job runs within a region where communication traffic does not interfere with other parts.
In addition, as another reason, communication of a job set containing multiple jobs having a common attribute (for example, a user ID or a charge code, execution priority, or the like) unpreferably interferes with communication of another job set having a different attribute. Furthermore, due to the reason for system management, jobs common in attribute are preferably disposed in the processor physically near to each other (for example, at the same rack or the same installation location).
In the following, in the parallel computer system, a set of processors being job arrangement unit is referred to as a node.
In a network of a large-scale parallel computer system, there are many cases that “nearness” on the network is presented by a plurality of independent parameters. For example, in a network having a hierarchical connection structure such as a fat tree, “nearness to the connection of the Nth layer” may be considered as a parameter representing an independent “nearness” when N is different. In the case of the fat tree, the number of stages connecting the switches of the Nth stage from each node corresponds to “nearness to the connection of the Nth layer”.
In addition, in the mesh or torus that is the representative network topology (of various dimensions) in the network of the large-scale parallel computer system, the network address itself is allocated as the lattice point on the N-dimensional space (points whose coordinates are all integer). Coordinates of different dimensions are naturally independent parameters.
As the network of the parallel computer system in which such an address is allocated as the lattice point on the N-dimensional space, there are other examples such as three-dimensional crossbar.
The relationship between “a plurality of independent parameters representing nearness” and a comprehensive “nearness” is different according to the network topology. Generally, in the network having the hierarchical connection structure, the nearness when viewed from the parameter of the upper layer (in the case of the fat tree, more switches of the upper layer are common as much as possible (thus, the number of hops in the upper layer is 0)) greatly contributes to the comprehensive “nearness”. However, in the mesh or the torus, the contributions of the nearness at the coordinates of a certain dimension and the comprehensive nearness are the same extent.
Here, a difference between the mesh and the torus in the network topology will be described.
FIGS. 37A and 37B are diagrams illustrating network topologies. FIG. 37A illustrates a mesh and FIG. 37B illustrates a torus.
In the two network topologies, a method of assigning coordinates of nodes is common and only the existence condition of communication links directly connecting two nodes is different.
In the torus, as illustrated in FIG. 37B, in addition to a certain communication link in the mesh, there is a wraparound link that connects nodes whose coordinates of other dimension coordinates are the same in a certain dimension whose coordinates are at a minimum value and a maximum value. There is also a system in which the wrap around link is present in a specific dimension and is not present in other dimension. Hereinafter, if there is any dimension where the wrap around link is present, the entire network is referred to a torus, but processing of the dimension where the wrap around link is not present is pursuant to the case of the mesh.
Also, in the following, the case where the network topology of the system is the (N-dimensional) mesh or torus is represented by using the network address, that is, the N-dimensional coordinates.
It is assumed that the coordinates of the node of the system are expressed as N integer pairs {{x(1), x(N)}|1≤I≤N and L(i)≤x(i)≤M(i)} by giving 2N integers L(1), . . . , L(N), M(1), . . . , M(N) L(i)<M(i) for all integers i (1≤i≤N)).
In this case, the number of node coordinates within the system is Π(M(i)−L(i)+1) (product of (M(i)−L(i)+1) for all i)
When the network topology (N-dimension) of the system is the mesh, the condition that the link directly connecting two nodes exists is that when the coordinates of each node are set as (x(1), x(2), x(N)), (y(1), y(2), . . . , y(N)), only the element corresponding to i of one of sets {|x(i)−y(i)|1≤i≤N} of absolute values of a difference of coordinates is 1 and the others are 0.
On the other hand, when the network topology (N-dimension) of the system is the torus, the condition that the link directly connecting two nodes exists is that when the coordinates of each node are set as (x(1), x(2), . . . , x(N)), (y(1), y(2), . . . , y(N)), only the element corresponding to i of one of sets {|x(i)−y(i)|1≤i≤N} of absolute values of a difference of coordinates is 1 or (M(i)−L(i)), and the others are 0. Here, the part “(M(i)−L(i))” is the above-described wrap around link.
When a plurality of jobs of different attributes is allocated to a divided region on a network, it is necessary to consider a variety of information or conditions. Specifically, the conditions related to the number of attributes necessary to distinguish from one another, the number of nodes required as a whole by a set of jobs of each attribute, and nearness or farness need to be classified in consideration. That is, there is a need to place jobs for each attribute on the network.
At this time, the key is a method of dividing a space provided with a set of a plurality of independent parameters related to nearness and farness.
Here, in a case where there is no advance information related to the number of necessary nodes, which is required by the job of each attribute, the division of the nodes in the system into regions having the connectivity of substantially the same size is considered as appropriate. In addition, in order to prevent each region from having different shapes, it is preferable that the nearness and the farness from the reference point in a certain region are not greatly different for each region.
When advance information related to the number of necessary nodes, which is required by the job of each attribute, is provided, the ratio of the number of nodes to be distributed to the job of each attribute represents the distribution ratio by the set of “fractions having a common denominator” approximated with a required accuracy. Therefore, the initial value is set by dividing all the nodes into connected regions having the same size as one another as many as the denominators and allocating the respective numerators to job attributes.
However, the number necessary to distinguish from one another (or common denominators of a fraction representing the appropriate distribution ratio) is variously different according to the system requirement specification independent from the system configuration or the network topology. Therefore, it is not easy, even though the network topology is limited to the N-dimensional torus and the N-dimensional mesh in the large-scale system, to set the connection region appropriate to the system configuration or the network topology according to the number of attributes necessary to distinguish from one another, such that the distance from the reference point of each region is not greatly different.
For example, in a five-dimensional mesh or torus, a method of arranging the reference points of 20 different attribute values to be large in distance between the reference points as much as possible and a method of arranging the reference points of 50 attribute values or 100 attribute values to be large in distance between the reference points as much as possible are quite different and, in each case, it cannot be said to be clear.
Furthermore, a process includes, as inputting a job, allocating a region having a size and an arrangement shape required by the input job and, as ending the job, cancelling the allocation. By repeating the process, the fragmentation is suppressed as much as possible, and the procedure of arranging the job of the same attribute value to be close is unclear even though the allocation range is completely fixed in advance.
In particular, when the system scale is large, it is impossible to adopt a processing procedure in which the number of nodes in the system is proportional to a search time. Even when the search range is limited to the vicinity of the reference point, in particular, in a case where the number of reference points is relatively small, it is also impossible to adopt a processing procedure proportional to the number of nodes in the system.
Therefore, in the past, when the jobs are allocated to the regions on the network, for example, the jobs are sequentially allocated to the empty regions. Therefore, it is impossible to place the jobs together to each attribute.