The present invention relates to an address server for managing network addresses and a method of attributing network addresses in a parallel computing environment.
In a parallel computing environment, such as a High Performance Computing environment, there are large parallel applications running on thousands of nodes. It is necessary to run those applications into containers in order to be able to checkpoint and restart the applications. The technique known as Checkpointing involves saving the state of a running application into a file such that the complete state may be restored and the application continued at a future time. The technique known as Restarting involves restoring the state from a checkpoint file and resuming execution in such a way that the application continues to run as if the application had not been interrupted (but possibly on a different set of compute nodes).
It is possible during the checkpoint and restart operations to save and restore the state of the TCP/IP connections, but this requires the virtualization of the network. To do so, at least one virtual IP address is associated to each container. The TCP/IP connections go through the virtual address, so the TCP/IP connections may be moved from a node to another. For a given application, all the virtual IP addresses must be belong to the same sub network. A virtual IP addresses must not be used at the same time by two different applications, or this will result in a TCP/IP conflict.