This invention relates to network server input/output (I/O) architecture and in particular to, aggregating bandwidth between a network server""s central processing unit (CPU) and its I/O system by bundling multiple physical links.
The Next Generation I/O (NGIO) architecture, as described in Next Generation I/O Link Architecture Specification: Link Specification published Mar. 26, 1999, is a channel-oriented, switched fabric, serial point-to-point link architecture aimed at meeting the growing needs of I/O reliability, scalability and performance on servers. NGIO introduces the use of an extremely efficient engine that is directly coupled to host memory which replaces shared buses with a fabric of switchable point-to-point links. This approach decouples the CPU from the I/O subsystem, as opposed to today""s load/store memory-mapped I/O, and addresses the problems of reliability, scalability, modular packaging, performance and complexity. CPU communication with peripherals occurs asynchronously with the I/O channel engine being responsible for moving data to and from main memory and allowing the bus to act as a switch with point-to-point links capable of near linear scaling with CPU, memory and peripheral performance improvements. The use of standard, off-the-shelf components, such as the link physicals, also permit this architecture to scale as higher bit rate frequencies become available providing backward compatibility and investment protection.
NGIO link architecture provides a method called Multiple Link Expansion (MLX) that aggregates the bandwidth of multiple parallel links to increase bandwidth and reduce latency. MLX allows multiple links to be connected between two devices. These parallel links can be bundled through MLX to work in concert as a single high bandwidth link.
During transmission, MLX transmits cells across bundled links. A bundle is made up of an ordered set of links. The transmitter must initiate cells across links of a bundle in a specified, round-robin order. The receiver knows this order and expects the cells to be distributed in that order.
FIG. 1 is a block diagram of the prior art showing a normal transmission in round-robin fashion over bundled links. Here, transmitter 100 transmits three packets, A, B, and C. Each packet is segmented into appropriately sized cells. Packet A consisting of two cells (A1 and A2), packet B consisting of three cells (B1, B2, and B3), and packet C consisting of a single cell (C1) are then sent from transmitter, 100, across bundled links connected to ports 104, 105, and 106. The order of the bundled links has been specified by a fabric manager as 104, 105 and 106.
A cell is first dispatched from the port at the top of the round-robin order, here 104. The next port in the round-robin order, 105, if staged with a cell, A2, may start transmitting that cell after the previous port, 104, has started to dispatch the previous cell, A1.
When the round-robin order reaches the last port in the order, 106, it wraps to the first port in the bundle, 104. To dispatch a cell, each port must wait until the previous port starts transmitting the previous cell. A port may start transmission at the same time as the previous port but must not transmit any earlier. FIG. 2 is a block diagram of the prior art showing such a simultaneous transmission. Transmission doesn""t occur until the port satisfies the minimum inter-cell gap requirement.
All ports of the bundle must follow this MLX order behavior. If the next port in the round-robin order does not have a cell staged, the round-robin ordering does not advance.
The receiver expects cells to arrive in a specified, round-robin order. FIG. 3 is a block diagram of the prior art showing a typical reception in round-robin fashion over bundled links. Here, packet A consisting of four cells, A1-A4, is sent to receiver 113 across a four-link bundle consisting of links 108, 109, 110 and 111. The reception sequence in the example is port 114,115,116, then 117. Port 114 is currently at the top of the receive round-robin order. All sequence numbers were correctly applied by the transmitter. Each cell follows MLX transmission order from the perspective of the transmitter.
Under MLX, cells must be received in the expected round-robin order. Links bundled together under MLX must use the same transmission rate. However, even links with the same transmission rate may have different flight times due to physical characteristics such as length. Therefore, cells may sometimes be received in violation of the ordering requirement.
According to one aspect of the invention, a method of combining multiple parallel links between a server""s CPU and its I/O system into a single channel is provided. The various links of the bundle are handled in a round-robin order. Variations in flight time between the various links are compensated for through a timer at each receive port of the bundle.