1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for processing data. Still more particularly, the present invention relates to a method, apparatus, and computer instructions for transferring data using link aggregation.
2. Description of Related Art
With the increasing demand for higher rates of data transfer, 1 G bit/sec is not enough bandwidth for many network connections between local area network (LAN) switches and from switches to high-demand network servers. Along with the bandwidth-consuming applications at Internet service providers (ISPs), application service providers, streaming media providers and the like, traditional network administrators may also be feeling the bandwidth pinch at their server connections. Trunking or link aggregation has been used to increase bandwidth. Link aggregation involves allowing a data processing system to treat more than one network interface as a single network interface. In other words, a number of different links between data processing systems may be “aggregated” into a single link.
In addition to increased bandwidth, link aggregation provides for increased reliability. Traditionally, to aggregate more than one network interface required manual intervention from the network administrator. The administrator has to specify the interfaces to be aggregated on both the host (e.g. an AIX server) and on the switch where the network adapters are connected. This specification is necessary because the switch needs to know that the traffic addressed to the link aggregation can be sent over any of the adapters belonging to the aggregation.
Efforts have been made to automate the creation of link aggregation, such as the IEEE 802.3ad standard. This standard defines a Link Aggregation Control Protocol (LACP) whereby the network host and the switch exchange Link Aggregation Control Protocol Data Unit (LACPDU) packets to decide which adapters are to be aggregated together. Intrinsic properties of the adapters (like duplexity and link speed) are used to decide which adapters belong to the same link aggregation.
The IEEE 802.3ad standard specifies that all packets belonging to the same conversation must be sent over the same adapter to prevent packet reordering at the link level. The way this transfer of data for a conversation is achieved is implementation-dependent. A conversation is a transfer of related data between two endpoints. An example of a conversation is a session between two hosts. A session is the active connection between two data processing systems. Furthermore, the host and the switch can use different schemes to decide over which adapter the packets belonging to the same conversation are sent. This situation means that it is quite possible that data packets sent from the host to the switch are sent over one adapter, but reply data packets sent from the switch back to the host are sent over another adapter. This sending of data packets is allowed by the standard, since in both directions packet reordering does not occur. In traditional network stacks this situation is not an issue.
However, when dealing with transmission control protocol (TCP)-offloaded adapters, such a situation could potentially become a problem. In TCP-offloaded adapters the TCP/Internet protocol (IP) is implemented in the adapter's hardware. Thus, the state for all the TCP connections going over a specific adapter is contained in that adapter, and not in a system-wide TCP layer that is shared among all the connections on the same host.
This configuration makes it imperative that reply data packets are received on the same adapter over which the outgoing data packets were sent because only that adapter is aware of the TCP state necessary to accept and process said reply packets. For example, if a data packet is sent on adapter 1 but its reply is received on adapter 2, the latter adapter does not have the TCP state necessary to process the reply packet. As a result, the reply would then be discarded. Examples of TCP state information includes expected sequence number and timeout information.
In the automated link aggregation standards in existence, no way is present for a host and a switch to negotiate which algorithm should be employed to decide which adapter should be used to send packets belonging to the same conversation. Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for forming a link aggregation.