The present invention relates generally to the field of configuring Infiniband networks, and more particularly to the electing and use of a manager that controls the configuration of an Infiniband network.
InfiniBand® is an industry-standard specification that defines and input/output architecture used to interconnect servers, communications infrastructure equipment, storage and embedded systems. InfiniBand® is a computer network communications connection used in high-performance computing featuring very high throughput and very low latency. InfiniBand® is used for data interconnect both among and within computers. InfiniBand® is a commonly used interconnect in supercomputers. InfiniBand® is a type of communications connection for data flow between processors and I/O devices that offers throughput of up to 56 gigabytes per second and supports for up to 64,000 addressable devices.
The internal data flow system in most personal computers (PCs) and server systems is inflexible and relatively slow. As the amount of data coming into and flowing between components in the computer increases, the existing bus system becomes a bottleneck. Instead of sending data in parallel (typically 32 bits at a time, but in some computers 64 bits) across the backplane bus, InfiniBand® specifies a serial (bit-at-a-time) bus. Fewer pins and other electrical connections are required, saving manufacturing cost and improving reliability. The serial bus can carry multiple channels of data at the same time in a multiplexing signal. InfiniBand® also supports multiple memory areas, each of which may be addressed by both processors and storage devices.
With InfiniBand®, data is transmitted in packets that together form a communication called a message. A message can be a remote direct memory access (RDMA) read or write operation, a channel send or receive message, a reversible transaction-based operation or a multicast transmission. Similar to the channel model many mainframe users are familiar with, a transmission or a message begins or ends with a channel adapter. Each processor has what is called a host channel adapter (HCA) and each peripheral device has a target channel adapter (TCA). HCAs are I/O engines located within a server. TCAs enable remote storage and network connectivity into the InfiniBand® interconnect infrastructure, called a fabric.
Infiniband® links have physical and logical state properties. The physical property of the link is negotiated in hardware. The logical state of the link is managed by software. When physical link goes up, the logical state of the link is not active. There is no address assigned to the port, and applications cannot communicate with the port using arbitrary data protocols. A possible communication is done by sending and receiving subnet management protocol (hereinafter SMP) Unicast datagrams (hereinafter UD), which are used to discover and configure the network. Infiniband® networks require a subnet manager software entity running on one of the nodes.
The Subnet Manager uses SMP datagrams to discover and configure the network. The discovery is done via direct route (by specifying each hop of source-to-destination path) and does not require switch routing. The task of the Subnet Manager is to discover the fabric, assign LID addresses to each end-point, configure switch routing tables and put each end-point to logical Active state. The Subnet Manager is also responsible for removing the no longer present end-points from the routing tables, and answering subnet administration (hereinafter SA) queries, which perform operations on its internal tables and do multicast management. Once Subnet Manager brings the end-point to Active state, the end-point can exchange data with other end-points in the fabric in Active state.
The Subnet Manager standard is covered in Infiniband® Architecture Specification. Existing standard assumes a single Subnet Manager in master role in the fabric and does not cover dynamic distributed configuration management (where configuration is supplied externally and shall be distributed to configuration manager). Enforcing expected configuration within a network cluster by subnet manager becomes a challenge since propagation of expected configuration and election of a configuration manager is not possible before network is operational. Furthermore, if configuration manager node fails to receive a configuration update or is considered failed by other nodes on the cluster, it will not have an up-to-date configuration and enforcing such configuration may break the network connectivity.