1. Technical Field
This invention relates to selecting an optimal communication path over a computer network. More specifically, the invention relates to a computer network configured with a computer operating in an InfiniBand network and a computer operating in a non-InfiniBand network, and a set of protocols to determine an optimal gateway for transmitting messages between the computers.
2. Description of the Prior Art
Input/Output (I/O) networks, such as system buses, are used by a processor to communicate with peripherals, such as network adapters. However, constraints in the architectures of common I/O networks, such as the Peripheral Component Interface (PCI) bus, limit the overall performance of computers. As a result, new types of I/O networks have been introduced.
One type of I/O network is known and referred to as the InfiniBand network, hereinafter IB. InfiniBand is an I/O architecture and specification for transmission of data between processors and I/O devices. Instead of sending data in parallel, which is the structure for sending data in PCI, IB send data in serial and can carry multiple channels of data at the same time in a multiplexing signal. The IB network replaces the PCI or other bus currently found in computers with a packet-switched network, complete with zero or more routers. FIG. 1 is a prior art block diagram (100) of a system area network based on the IB architecture. The IB network is broken up into separate autonomous management units, each containing multiple IB nodes, called subnets. As shown, there are six nodes, node0 (102), node1 (104), node2 (106), node3 (108), node4 (110), and node5 (112) interconnected by a fabric (120) consisting of three switches, switch0 (122), switch1 (124), and switch2 (126). Each node connects to the fabric (120) through a channel adapter. The IB specification classifies the channel adapters into two categories, host channel adapters (HCA) and target channel adapters (TCA). The HCA is an interface that is used to integrate the IB with the operating system. The TCA is present on I/O devices, such as a RAID subsystem. As shown in FIG. 1, node2 (106), node3 (108), and node5 (112) represent peripherals and include TCAs (136), (138), and (142), respectively. Similarly, node0 (102), node1 (104), and node4 (110) represent operating systems and include HCAs (132), (134), and (140), respectively. Furthermore, in the example shown herein, each channel adapter may have one or more ports. A channel adapter with more than one port may be connected to multiple switch ports. For example, channel adapter (140) has at least two ports, with a first port connected to switch0 (122) and a second port connected to switch1 (124). Accordingly, as shown multiple paths between a source and a destination are available in the IB architecture, resulting in performance and reliability benefits.
IB components are assigned a global identifier (GID) during initialization. The GID is used to uniquely identify the target component both within and across IB subnets. A router may be provided to interconnect two or more subnets to form a larger system area network. IB Routers, like IB switches, forward packets between their ports. The difference between routers and switches is that a router is used to interconnect two or more subnets to form a larger multi-domain system area network. Within a subnet, each port is assigned a unique identifier called the local identifier (LID). Switches make use of the LIDs for routing packets from the source to the destination, whereas routers make use of the GIDs for routing packets across domains.
In order for an application to communicate with another application over the IB architecture, it must first create a work queue that consists of a queue pair, which is a pair of queues—one queue for send requests and one queue for receive requests. In order for the application to execute an operation it must place a work queue element (WQE) in the work queue. Thereafter, the operation is picked up for execution by the channel adapter. Accordingly, the work queue forms the communication medium between applications and the channel adapter.
By having multiple paths available for transmitting data between nodes, the fabric is able to achieve transfer rates at the full capacity of the communication channel, avoiding congestion issues that may arise in shared bus architecture.
Remote direct memory access (RDMA) is a communications technique used in IB that allows data to be transmitted from the memory of one computer to the memory of another computer without passing through either device's CPU, without needing extensive buffering, and without calling to an operating system kernel. Through RDMA, data can be transferred faster since it does not have to pass through the CPU. Although RDMA is supported in the IB architecture, it is not universally supported across all networks. There are circumstances where a computer on a non-IB network is in communication with a computer on an IB network and data transfer and communication between the two computers is warranted. Such communication utilizes gateways between the IB and non-IB network to transfer data packets between the two networks. A gateway is a node that serves as an entrance to another network. It is known in the art that a gateway can support RDMA data transfer between an IB network and a non-IB network. However, the prior art solutions available for determining an optimal path for data transfer in such a circumstance are complex and costly. Accordingly, there is a need for a solution that efficiently determines an optimal communication path and data transfer technique between an RDMA configured gateway and a generic gateway, such as IPoIB, that are in communication with the IB network.