In recent years, a distributed memory parallel computer that is configured in such a manner that a lot of nodes that each include a processor and a memory and independently execute an Operating System (OS) are connected by an interconnection has been mainstream in a field of High Performance Computing (HPC). In such a distributed memory parallel computer, processes are booted in respective nodes and booted processes intercommunicate to exchange data thereof and execute parallel computation. An interconnection includes a network that connects nodes and a device for connecting the nodes to the network.
For reliably transferring data between two nodes, communication is controlled according to steps that are adapted to a network that connects the nodes and a characteristic of a device. Steps that are adapted to a network that connects nodes and a characteristic of a device for connecting the nodes to the network are also referred to as a protocol.
A protocol process for a Transmission Control Protocol (TCP)/an Internet Protocol (IP) that is used in the Internet is generally executed by a protocol stack of an OS. In each process, data input to or output from a protocol stack is executed by a software interface such as Barkley Socket. A protocol process that is executed by a protocol stack is executed in a system process of an OS. That is, in a case where a protocol process is executed, a process that is executed by interruption in a processor transfers to an OS kernel, and during all that time, it is difficult for the processor to execute another computation.
For increasing operation efficiency of a processor, that is, bringing execution performance closer to theoretical performance, it is preferable for a processor to execute no protocol process. In a field of HPC, data are transferred by a method that is generally referred to as a Remote Direct Memory Access (RDMA). RDMA is a technique such that an interconnection device directly reads a memory that is managed in a transmission source process, and data are transferred through a network that connects nodes and directly written in a memory that is managed in a destination process.
In such an RDMA method, communication start control is also received to start a process similarly to another protocol process. For communication start control in RDMA, a method that controls a communication start based on a memory map register that is mapped in a memory space is general from a viewpoint of hardware. For communication start control in RDMA, an interconnection is also a kind of an input or output device, and hence, a method such that an OS gathers communication start request of respective processes and a device driver controls communication representatively is general from a viewpoint of software. A device driver is a mechanism of an OS for gathering control request from various processes to control an input or output instrument representatively.
Communication start control through a device driver is involved with a process of interrupting processing of a process and transferring to an OS kernel process. Such a process is referred to as a process switch from a viewpoint of software and referred to as a context switch from a viewpoint of hardware. A process of interrupting processing of a process and transferring to an OS kernel process is a process with a large overhead, and hence, is preferably avoided in an HPC where computation time of a processor is important. For example, a plurality of sets of control registers that are referred to as communication interfaces are prepared, and such a communication interface is allocated to a virtual memory space of each process and is exclusively utilized in such a process, so that communication start control through a device driver can be avoided. Such a communication interface corresponds to a device for connecting nodes in an interconnection to a network.
In parallel computers that are mounted with an interconnection that includes a plurality of communication interfaces, the communication interfaces are allocated to not only each process in an node but also a process of executing communication for a system such as file input or output or system control. Herein, communication for a system is communication in a process that is ordinarily operated at a time of node operation, and for example, communication for management that is executed by a program that manages a whole system, communication with a storage device, or the like.
Allocation of a communication interface to a process of executing communication for a system based on system software tends to be fixed. In such fixed allocation of a communication interface, extension of a communication resource is difficult. In a case where a communication interface is allocated fixedly, virtualization of an interconnection on software is executed when a virtual machine is introduced.
There is a conventional technique of a computer cluster that uses a correspondence relation between a global identifier that is allocated to a whole cluster system and a local identifier that is used in each computer.
Japanese Laid-open Patent Publication No. 2003-316637
However, in an interconnection that corresponds to virtualization, an OS and a device driver manage communication interfaces in order to hide the number of practical communication interfaces. In such a case, a communication interface is dynamically allocated to even a process of executing communication for a system.
For executing communication between dynamically allocated communication interfaces, exchange of identifiers of communication interfaces between nodes is preliminarily executed by using a Transmission Control Protocol (TCP)/an Internet Protocol (IP), or the like. Hence, another communication means is used substantially, and thereby, is not suitable for use for a purpose of a communication for a system. From such a reason, it is not preferable to dynamically allocate a communication interface to a process of executing communication for a system.
On the other hand, in a communication interconnection that does not correspond to virtualization, competition of communication interfaces may be caused between virtual machines at a time of introduction of the virtual machines in a case where a communication interface is fixedly used for a purpose of communication for a system. In such a case, virtualization that is executed by software is used for resolving competition, but is difficult to be used in a field of HPC because an overhead for a software process is large. Thus, in a case where a communication interface is fixedly used for a purpose of communication for a system, another process is influenced thereby, so that degradation of throughput of a parallel processing apparatus is caused.
For avoidance of competition of communication interfaces between virtual machines, a method is considered for multiplexing a virtualized input or output device. For example, an input or output device has conventionally been multiplexed by allocating a plurality of virtualized input or output device to each virtual machine. In a case where such a method is applied to an interconnection, competition of communication interfaces between virtual machines at a time of introduction of the virtual machines can be avoided. However, even though virtual input or output device is multiplexed in such a method, only one substance is provided, so that avoidance of competition of setting registers or the like is realized by hardware and mounting is complicated.
A conventional technique for a computer cluster that uses a correspondence relation between a global identifier and a local identifier is information that is used for management of a resource in a cluster and is difficult to be used for communication between nodes.