The present invention relates generally to the field of Infiniband networks, and more particularly to achieving a coherent in-band switch management in a distributed environment.
InfiniBand is an industry-standard specification that defines an input/output architecture used to interconnect servers, communications infrastructure equipment, storage and embedded systems. InfiniBand is a computer network communications connection used in high-performance computing featuring very high throughput and very low latency. InfiniBand is used for data interconnect both among and within computers. Infiniband is a commonly used interconnect in supercomputers. InfiniBand is a type of communications connection for data flow between processors and I/O devices that offers throughput of up to 56 gigabits per second and support for up to 64,000 addressable devices.
The internal data flow system in most personal computers (PCs) and server systems is inflexible and relatively slow. As the amount of data coming into and flowing between components in the computer increases, the existing bus system becomes a bottleneck. Instead of sending data in parallel (typically 32 bits at a time, but in some computers 64 bits) across the backplane bus, InfiniBand specifies a serial (bit-at-a-time) bus. Fewer pins and other electrical connections are required, saving manufacturing cost and improving reliability. The serial bus can carry multiple channels of data at the same time in a multiplexing signal. InfiniBand also supports multiple memory areas, each of which can addressed by both processors and storage devices.
With InfiniBand, data is transmitted in packets that together form a communication called a message. A message can be a remote direct memory access (RDMA) read or write operation, a channel send or receive message, a reversible transaction-based operation or a multicast transmission. Like the channel model many mainframe users are familiar with, transmission begins or ends with a channel adapter. Each processor (your PC or a data center server, for example) has what is called a host channel adapter (HCA) and each peripheral device has a target channel adapter (TCA). HCAs are I/O engines located within a server. TCAs enable remote storage and network connectivity into the Infiniband interconnect infrastructure, called a fabric.
Infiniband networks are set up and managed using in-band communication from a software entity called Subnet Manager. The physical link management is done in hardware, and when the physical links are brought up, the Subnet Manager performs the discovery of fabric and assigns addresses to discovered switches, HCAs, and TCAs. Subnet Manager communicates with the discovered devices using source based routing, specifying the route from end-to-end (direct routing in Infiniband terms). The configuration and discovery are done via sending SMP (subnet manager protocol) MAD (management datagrams) messages. When addresses are assigned, the Subnet Manager configures the routing tables on switches, and moves all discovered link end-points to logical “ACTIVE” state (in non-active state, the end-points can only use direct routing for communication). In active state, routing by destination address can be used. The switch will do the routing in this case consulting its routing tables which match destination address and switch port number. There are two types of switch routing tables. For unicast traffic, the LFT (linear forwarding table) is used. It matches destination address with a single output switch port. For multicast traffic, the MFT (multicast forwarding table) is used. It matches a multicast destination address with a list of switch ports to deliver the packet.