Conventionally, computer systems with a plurality of nodes have a cluster configuration in which nodes are connected with each other by interconnects. In the cluster configuration, the nodes execute distributed processes in parallel, which improves performance of the computer system. In addition, even if some of the nodes causes a failure and stops its process, the normal nodes continue to execute the processes, which improves availability of the computer system.
A configuration example of a computer system having a cluster configuration will be described with reference to FIG. 21. FIG. 21 is a diagram illustrating a configuration example of a computer system having a cluster configuration. As illustrated in FIG. 21, a computer system 900 includes computing nodes 901 to 912, IO (input-output) nodes 913 to 916, and IO devices 917 and 918. The computing nodes 901 to 912 execute various arithmetic operations and application processes. The IO nodes 913 to 916 control input into and output from the IO devices. The IO devices 917 and 918 are storage devices storing data and applications, for example.
As illustrated in FIG. 21, connections are made by interconnects between the computing nodes, between the IO nodes, and between the computing nodes and the IO nodes. In the example of FIG. 21, the IO node 913 is connected to the IO device 917, the IO node 914 is connected to the IO device 918, the IO node 915 is connected to the IO device 918, and the IO node 916 is connected to the IO device 917. The connection illustrated by a dotted line between the IO node 914 and the IO device 918 is a standby connection that is used if an abnormality occurs at the connection illustrated by a solid line between the IO node 915 and the IO device 918. Similarly, the connection illustrated by a dotted line between the IO node 916 and the IO device 917 is a standby connection that is used if an abnormality occurs at the connection by a solid line between the IO node 913 and the IO device 917.
In the foregoing computer system, in a case of transferring data to an IO device, a computing node first transfers the data to an IO node connected to the IO device as a destination by dimension-order routing. For example, the data to be transferred by dimension-order routing is first transferred from a source node along an X axis to a node with an X coordinate corresponding to the X coordinate of a destination node. Then, the transfer direction of the data is converted onto a Y axis, and the data is transferred to the destination node along the Y axis.
For example, in a case where the computing node 901 illustrated in FIG. 21 transfers data to the IO device 918, the computing node 901 first transfers the data to the IO node 915 connected to the IO device 918 through the computing nodes 902, 903, 907, and 911. The communication path selected by dimension-order routing is generally preset to the computing nodes 901 to 912 and the IO nodes 913 to 916.
Patent Literature 1: Japanese Laid-Open Patent Publication No. 2010-218364
However, the foregoing related technique has a problem that availability of data transfer is not high.
For example, if there occurs an abnormality at a node existing in a path selected by dimension-order routing or connection (interconnect) between the nodes, a computing node fails to transfer data. Referring to FIG. 21, descriptions will be given as to a case where the computing node 903 and the computing node 907 are disconnected. The computing node 901 is connected to the IO node 915 on paths bypassing the computing node 903, but the computing node 901 attempts to transfer data through the computing node 903 along a path selected by dimension-order routing. As a result, the computing node 901 fails to transfer the data to the IO node 915.
In the related technique, if a failure occurs at an active IO node connected to an IO device, a computing node may fail to transfer data. Referring to FIG. 21, descriptions will be given as to a case where, when the computing node 906 transfers data to the IO device 918, an abnormality occurs at the active IO node 915 and the IO node 914 is switched from standby state to active state.
Even if the active node is switched from the IO node 915 to the IO node 914, the computing node 906 attempts to transfer data to the computing node 907, the computing node 911, and the IO node 915 along the path selected by dimension-order routing. In this case, when the IO node 915 fails to transfer data to the IO node 914, the computing node 906 fails to transfer data to the IO device 918.
Meanwhile, when the IO node 915 can transfer data to the IO node 914, the computing node 906 can transfer data to the IO device 918. However, data is transferred to the IO node 914 through more nodes than those in the shortest path from the computing node 906 to the IO node 914. For example, when data is to be transferred from the computing node 906 to the IO node 914, the path with the smallest number of nodes includes the computing node 910. However, the computing node 906 follows the preset path and thus fails to select the shortest path. Accordingly, when data is transferred, transfer of the data by the computing node may be delayed.