1. Field of the Invention
The present invention relates to a host channel adapter configured for communication with target channel adapters in an InfiniBand™ server system.
2. Background Art
Networking technology has encountered improvements in server architectures and design with a goal toward providing servers that are more robust and reliable in mission critical networking applications. In particular, the use of servers for responding to client requests has resulted in a necessity that servers have an extremely high reliability to ensure that the network remains operable. Hence, there has been a substantial concern about server reliability, availability, and serviceability.
In addition, processors used in servers have encountered substantial improvements, where the microprocessor speed and bandwidth have exceeded the capacity of the connected input/output (I/O) buses, limiting the server throughput to the bus capacity. Accordingly, different server standards have been proposed in an attempt to improve server performance in terms of addressing, processor clustering, and high-speed I/O.
These different proposed server standards led to the development of the InfiniBand™ Architecture Specification, (Release 1.0), adopted by the InfiniBand™ Trade Association. The InfiniBand™ Architecture Specification specifies a high-speed networking connection between end nodes (e.g., central processing units, peripherals, etc.) and switches inside a server system. Hence, the term “InfiniBand™ network” refers to a private system area network (SAN) that connects end nodes and switches into a cluster within a server system, enabling the sharing of cluster resources. The InfiniBand™ Architecture Specification specifies both I/O operations and interprocessor communications (IPC).
A particular feature of InfiniBand™ Architecture Specification is the proposed implementation in hardware of the transport layer services present in existing networking protocols, such as TCP/IP based protocols. The hardware-based implementation of transport layer services provides the advantage of reducing processing requirements of the central processing unit (i.e., “offloading” processor code execution), hence offloading the operating system of the server system.
However, substantial concerns arise if attempts are made to embed an HCA into a processor core, for example as a processor configured for InfiniBand™ communications. In particular, a stand-alone HCA device may have a prescribed number of external pins for a memory interface configured for accessing external memory. However, adding the HCA memory interface, having the prescribed number of external pins, to a processor core that already has its own memory interface would result in an inefficient implementation having two memory interfaces, resulting in excessive pins, and a substantially higher packaging cost.
An additional concern when embedding an HCA into a processor core is the necessity of a small die size to reduce costs, resulting in a substantially smaller internal memory being available than if the HCA were implemented as a discrete device. However, conventional HCA architectures require substantially more memory for buffering between the Transport Layer and Link Layer transmit path (e.g., 256 kbytes) than typically would be permitted for an embedded HCA in a processor core (e.g., 16 kbytes).
The InfiniBand™ Architecture Specification requires that a packet sent via an HCA undergoes transport layer service, followed by link layer service, based on creation of a work queue entry (WQE) in system memory by an executable verbs consumer resource. Each work queue entry represents one message that needs to be transmitted for the verbs consumer resource. A message can be up to 2 gigabytes (GB) long; hence, a message may need to be broken down into packets that can be transmitted across the InfiniBand™ network. The size of the packet depends on the Maximum Transfer Unit (MTU) for the path to be used for transmitting the packet across the InfiniBand™ network: the MTU sizes may be 256, 512, 1024, 2048, or 4096 bytes. Hence, if an embedded HCA was only allocated 16 kbytes of memory for buffering between the Transport Layer and Link Layer transmit path, the HCA could only store four packets of the largest MTU size (4096).
Examples of operations performed during transport layer service (performed, for example, by a transport layer module) include constructing a transport layer header, generating a packet sequence number, validating service type, etc., based on detecting a work notification of the work queue entry created in the system memory. Examples of operations performed during link layer service (performed, for example, by a link layer module) include service layer and virtual lane mapping (SL-VL mapping), link layer flow control packet generation, link layer transmission credit checking, etc. Note that the transport layer module and the link layer module operate independently and in parallel; hence, the transport layer module attempts to supply the link layer module with packets to transmit, typically by constructing the packets and depositing the packets in an output buffer, while the link layer module continually withdraws packets from the output buffer and transmits them onto the InfiniBand™ network. A particular concern is that the HCA is able to continually transmit packets to keep the link “busy” and avoid gaps on the link (i.e., avoid “dead time” on the link).
Typically the transport layer module would service work queue entries by sequential processing of the respective work notifications, using a first in first out arrangement. However, the link layer operations within the HCA are configured for transmitting InfiniBand™ packets according to virtual lane prioritization. In particular, the InfiniBand™ Architecture Specification defines the virtual lanes as the means to implement multiple logical flows over a single physical link. An HCA may support up to 16 different virtual lanes, where each virtual lane has its own corresponding set of buffer resources, including link level flow control. Link level flow control in an InfiniBand™ network utilizes a token based system, where a link partner (e.g., a channel adapter or a switch) sends flow control tokens to the transmitting channel adapter each time the buffer space is freed in the link partner. If the transmitting channel adapter does not have sufficient flow control tokens to accommodate an entire packet for a given virtual lane, the transmitting channel adapter cannot send any more packets for the virtual lane until more flow control tokens have been received.
Hence, if an embedded HCA allocated a 16 kB output buffer has four stored packets, each of the maximum MTU size (4096 bytes), and the virtual lanes for those packets do not have enough flow control tokens, the link layer module would need to wait until more tokens are received for those virtual lanes before transmitting the data packets. In addition, if one attempted to store at least one 4 kB packet for each of the sixteen (16) supported virtual lanes, the size of the output buffer would expand to 64 kB, instead of the allowed 16 kB.
The problem is further compounded if storage of multiple packets for each virtual lane is preferred in the case where the link layer utilizes a virtual lane high/low priority table. In particular, each entry in the virtual lane high/low priority table specifies the virtual lane that should be serviced next, and the weight of the virtual lane, in terms of how many bytes should be transmitted onto the virtual lane before moving to the next entry in the table. Hence, it may be desirable that the output buffer stores more than one packet for each virtual lane, to enable each virtual lane to utilize the bandwidth allocated according to the virtual lane high/low priority table. Hence, if four packets (each having an MTU size of 4 kB) were allocated to each of the 16 virtual lanes, the resulting output buffer size would be 256 kB, substantially higher than the 16 kB buffer contemplated for the embedded HCA.