The operation of a local high speed network is dependent on the capability of transferring data with a high speed. Particularly, for a remote direct memory access (RDMA) via a network, for example in high performance computing, the network has to admit lossless data transfer with high data rates and very low latency.
One suggestion to implement these requirements is to use the Infiniband architecture. Infiniband is a trademark of the Infiniband Trade Association. A reference to Infiniband herein means a reference to the Infiniband specification, particularly the Infiniband architecture, issued by the Infiniband Trade Association. The Infiniband architecture comprises nodes that are connected via a fabric. A node can be a processor node, an I/O unit, and/or a router to another Infiniband sub-network.
The administration of an Infiniband network is performed by a so-called Infiniband subnet manager. The Infiniband subnet manager is a software entity with the task to manage an Infiniband subnet and can reside on any one node. The subnet manager discovers the topology of the subnet that it manages, assigns a subnet ID to each port, assigns an address to each port in the subnet, establishes the possible paths between all end nodes in the subnet, sets QoS parameter and sweeps the subnet on a regular basis looking for topology changes.
Only one subnet manager entity can be master within a subnet. All other started subnet manager entities in the subnet are in a standby state or in a not-active state. The subnet manager entity with the highest priority and the lowest global user identity (GUID) has to become master during a failover from the actual master.
FIG. 3 shows a state machine presentation of an Infiniband subnet manager. After an initialization 101, the Infiniband subnet manager is in the discovering state 103. In this state, the Infiniband subnet manager performs sweeping the network for discovering changes, for example, discovering new nodes. When the subnet manager discovers another subnet manager with a higher priority or a master, it changes to a standby state 107. A subnet manager in standby state may change 109 again to the discovering state 103 upon a polling timeout or reception of a discover control packet. On reception of a disable control packet, 115, the Infiniband subnet manager may change from the standby state 107 to the not-active state 113, and, on reception of a standby control packet, 117, the Infiniband subnet manager may change from the not-active state 113 to the standby state 107. In case the subnet manager is in the standby state 107 and receives a handover control packet, 119, it change from the standby state 107 to the master state 121, and, thus, becomes a master subnet manager. Also, in the case the discovering is completed, 111, the subnet manager can change from the discovering state 103 to the master state 121, when it did not discover any other subnet manager. The subnet manager can leave the master state 121 upon, for example, a response to poll, a topology change or reception of a handover control packet.
A subnet manager uses control packets to command another subnet manager to change its state. A change of state is initiated by, for example, a handover control packet that is used by the current subnet manager to initiate handing over mastership of the subnet, an acknowledge control packet that is used by the new master to acknowledge the hand over from the old master, a disable control packet that is used from the current master to transfer another subnet manager from standby state to not-active state, a standby control packet that is used by the current master to transfer an other subnet manager from the not-active state to the standby state, and a discover control packet that is used to transfer a subnet manager from the standby state to the discovering state. The control packet may also be termed as administrative message and, for example, a standby control packet may for the reason of brevity be termed as standby message.
A problem is that in a specific vendor implementation a lot of features and internal functionality may exist which are not specified in the Infiniband specification. In the case, a specific subnet manager is implemented having a lot of additional features compared with a conventional subnet manager, there is no possibility to ensure that the specific subnet manager will stay the master within the subnet. Is there another new subnet manager entity on a port with a higher priority or the same priority and a lower GUID, the specific subnet manager has to handover the master state to the other subnet manager. Thus, it can not be excluded that the new master is from another vendor and has not implemented the additional features. In this case, the Infiniband applications that depend on the Infiniband subnet manager with the additional features can not use these additional features any longer, with the potential consequence that they have to terminate their execution.