Existing networking and interconnect technologies have failed to keep pace with the development of computer systems, resulting in increased burdens being imposed upon data servers, application processing and enterprise computing. This problem has been exasperated by the popular success of the Internet. A number of computing technologies implemented to meet computing demands (e.g., clustering, fail-safe and 24×7 availability) require increased capacity to move data between processing nodes (e.g., servers), as well as within a processing node between, for example, a Central Processing Unit (CPU) and Input/Output (I/O) devices.
With a view to meeting the above described challenges, a new interconnect technology, called the InfiniBand™, has been proposed for interconnecting processing nodes and I/O nodes to form a System Area Network (SAN). This architecture has been designed to be independent of a host Operating System (OS) and processor platform. The InfiniBand™ Architecture (IBA) is centered around a point-to-point, switched IP fabric whereby end node devices (e.g., inexpensive I/O devices such as a single chip SCSI or Ethernet adapter, or a complex computer system) may be interconnected utilizing a cascade of switch devices. The InfiniBand™ Architecture is defined in the InfiniBand™ Architecture Specification Volume 1, Release 1.0, released Oct. 24, 2000 by the InfiniBand Trade Association. The IBA supports a range of applications ranging from back plane interconnects of a single host, to complex system area networks, as illustrated in FIG. 1 (prior art). In a single host environment, each IBA switched fabric may serve as a private I/O interconnect for the host providing connectivity between a CPU and a number of I/O modules. When deployed to support a complex system area network, multiple IBA switch fabrics may be utilized to interconnect numerous hosts and various I/O units.
Within a switch fabric supporting a System Area Network, such as that shown in FIG. 1, there may be a number of devices having multiple input and output ports through which data (e.g., packets) is directed from a source to a destination. Such devices include, for example, switches, routers, repeaters and adapters (exemplary interconnect devices). Where data is processed through a device, it will be appreciated that multiple data transmission requests may compete for resources of the device. For example, where a switching device has multiple input ports and output ports coupled by a crossbar, packets received at multiple input ports of the switching device, and requiring direction to specific outputs ports of the switching device, compete for at least input, output and crossbar resources.
In order to facilitate multiple demands on device resources, an arbitration scheme is typically employed to arbitrate between competing requests for device resources. Such arbitration schemes are typically either (1) distributed arbitration schemes, whereby the arbitration process is distributed among multiple nodes, associated with respective resources, through the device or (2) centralized arbitration schemes whereby arbitration requests for all resources is handled at a central arbiter. An arbitration scheme may further employ one of a number of arbitration policies, including a round robin policy, a first-come-first-serve policy, a shortest message first policy or a priority based policy, to name but a few.
The physical properties of the IBA interconnect technology have been designed to support both module-to-module (board) interconnects (e.g., computer systems that support I/O module add in slots) and chassis-to-chassis interconnects, as to provide to interconnect computer systems, external storage systems, external LAN/WAN access devices. For example, an IBA switch may be employed as interconnect technology within the chassis of a computer system to facilitate communications between devices that constitute the computer system. Similarly, an IBA switched fabric may be employed within a switch, or router, to facilitate network communications between network systems (e.g., processor nodes, storage subsystems, etc.). To this end, FIG. 1 illustrates an exemplary System Area Network (SAN), as provided in the InfiniBand Architecture Specification, showing the interconnection of processor nodes and I/O nodes utilizing the IBA switched fabric.
The IBA Specification discusses the implementation of multiple data virtual lanes (VLs), an arbitration scheme to be employed when arbitration between packets on the multiple data virtual lanes. The proposed arbitration scheme is a two-level scheme, which utilizes preemptive scheduling layered on top of a weighted fair scheme. The scheme provides for a method to ensure the progress of requests on low-priority virtual lanes. The weighing, prioritization and minimum forward progress bandwidth are each programmable.
FIG. 2A illustrates a virtual lane arbitration table 11 that may be utilized according to the IBA Specification to control virtual lane arbitration. The arbitration table 11 consists of three components, namely a high-priority list 12, a low-priority list 13, and a limit of high-priority component 14. The high-priority list 12 has a minimum length of 1 entry, and a maximum length of 64 entries. The low-priority list 13 has a minimum length equal to the number of the data virtual lanes supported by an interconnect device, and a maximum length of 64 entries.
The high-priority list 12 and the low priority list 13 each contain a virtual lane number 15 (e.g., a value from 0–14 for a used entry, or a value of 15 to indicate an unused entry) and a weight value 16 (e.g., a value 0–255) indicating the number of 64 byte units that may be transmitted via the relevant virtual lane when selected during an arbitration process. A weight value 16 of zero indicates that the relevant entry within the arbitration table 11 should be skipped.
The limit of high-priority component 14 indicates the number of high-priority packets that can be transmitted without an opportunity to send a low-priority packet. Specifically, the number of bytes that can be sent is a “limit of high-priority”value times 4 K bytes, with the counting done in the same manner described above for weight values 16. In other words, the calculation is done to 64 byte increments, and a high priority packet can be sent if a current byte count has not exceeded the limit of high-priority value. A limit of high-priority value of 255 indicates that the byte limit is unbounded, in which case there is no guarantee of forward progress for low priority packets. A limit of high-priority value of zero indicates that only a single packet may be sent from the high-priority list 12 before an opportunity is given to the low-priority list 13.
FIG. 2B illustrates a populated arbitration table 11 for a port supporting a number of virtual lanes, virtual lanes 0–4 having been allocated bandwidth as indicated by respective weight values 16. It will be noted that multiple entries for a single virtual lane (e.g., virtual lane 0) may appear within the arbitration table 11. In this way, the virtual lane 0 is allocated bandwidth at periodic intervals, to provide continuity of service to this virtual lane 0. For example, virtual lane 0 may be dedicated to the transport of video data, in which case it may be desirable to allocate bandwidth on a periodic basis to this virtual lane. Alternatively, the arbitration table 11 could be programmed to contain only a single entry for the virtual lane 0, but the weighting value 16 attributed thereto could be increased (e.g., to value of 4). This would result in the same effective bandwidth allocation to virtual lane 0, but would be better suited to an application where burst service (as opposed to continuous stream service) was required.
FIG. 2B also illustrates that the arbitration table 11 is referenced by an index pointer, which points to a currently selected virtual lane. Upon notification of a pending request, an arbitration mechanism, seeking to identify the next virtual lane to be serviced, performs a walk through the arbitration table 11 in a direction indicated by the arrow 17 to identify a next virtual lane for which a request is pending and that has sufficient credits in order to service a request. Consider the situation illustrated in FIG. 2B, where the walk through the table commences at entry 0, and a pending request requires service on virtual lane 4. The walk through the arbitration table 11 may require one clock cycle per entry. The walk through the example arbitration table 11 to entry 9 could take up to 9 clock cycles. As stated above, the arbitration table 11 may have up to 64 entries. It will be appreciated that in situations where the index pointer is far removed from an entry for a virtual lane for which a request is pending, a walk through the arbitration table 11 may consume a number of clock cycles before the relevant entry is encountered.