This invention is directed to the implementation of link aggregation (also known as trunking, or inverse multiplexing) in Ethernet frame switches. A hardware and firmware combination distributes frames across parallel links without misordering problems.
Link aggregation technology termed xe2x80x9cinverse multiplexingxe2x80x9d has been used for some time in wide-area networks, but has been adopted only recently (as xe2x80x9ctrunkingxe2x80x9d) in state-of-the-art Ethernet frame switches. Link aggregation provides redundancy and load balancing across medium access control (MAC) entities connecting Ethernet switches to each other, or to high-speed server computers.
The technique consists of establishing multiple, parallel physical links between two entities that must communicate with each other (i.e., switches, routers and/or network servers), and then logically binding these parallel links into a single logical link having a higher effective bandwidth than any one physical link. Packets belonging to a single packet stream, that must be transferred between the two communicating entities, are separated and distributed across the physical links joining them using some well-defined algorithm by the source entity, and are subsequently recombined by the destination entity back into a single stream. Note that link aggregation does not encompass schemes for segmenting packets into smaller units and distributing them across multiple links; it is assumed that packets are transmitted in their entirety on specific physical links.
A typical prior art Ethernet link aggregation implementation utilizes a hardware means for distributing packets across multiple physical links, and re-aggregating them at the receiving end. This is typically due to the high speeds involved (100 Mb/s or even 1000 Mb/s per link) in the packet transfer. The use of such hardware is expensive in terms of the silicon resources required to perform the distribution and collection functions, and is also inflexible in terms of the algorithms used to determine how packets may be distributed across links. Additionally, the complexity of the distribution function when accounting for the various packet ordering and sequencing requirements of the Ethernet protocol renders a hardware-only approach difficult to design and debug. A well-partitioned, mixed hardware/firmware approach is preferable when implementing link aggregation at high speeds. This approach, permits high speeds to be attained while at the same time preserving flexibility in implementation, which is necessary for tracking changing standards or implementing different distribution algorithms.
The preferred link aggregation scheme should satisfy the following objectives:
1. The link aggregation distribution algorithm must not re-order frames belonging to the same connection. In this context, a xe2x80x9cconnectionxe2x80x9d is a particular combination of source and destination MAC addresses obtained from the Ethernet frame header.
2. The link aggregation mechanism should distribute frames across multiple parallel physical links as evenly as possible, subject to the above ordering constraint.
3. The distribution algorithm must be capable of preserving the frame ordering when a frame stream transitions from a floodxe2x80x94i.e., a multicast produced when the destination address within the frame is unknownxe2x80x94to a unicast after the destination address has been learned by the normal Ethernet bridging process.
4. The distribution algorithm must preferably distribute not only unicast traffic but also multicast traffic (i.e., traffic sent to a set of destination physical ports) across aggregated links.
5. The link aggregation scheme should use a minimum of hardware resources in order to lower cost, without sacrificing performance at the same time.
This invention provides a link aggregation algorithm embodied within a mixed hardware/firmware packet forwarding datapath that accepts incoming frames, determines whether they are destined to be transferred across an aggregated link (i.e., a single logical link consisting of multiple physical links), and then distributes them across the multiple physical links using a pre-defined distribution algorithm.
The invention facilitates distribution of data packets between one or more physical incoming ports and one or more physical outgoing ports. Packets containing source and destination addresses are received on one or more of the incoming ports. An address look-up table stores previously processed source and destination addresses, together with source and destination contexts associated with the respective source and destination addresses. The contexts represent either a specific physical port, or an aggregated grouping of ports. A distribution table stores, for each aggregated grouping of outgoing ports, a corresponding aggregated group of identifiers of specific outgoing ports.
As each packet is received, its source and destination addresses are extracted and the address look-up table is searched for those source and destination addresses. If the address look-up table contains those source and destination addresses then the source and destination contexts associated with those source and destination addresses are retrieved from the address look-up table. If the address look-up table does not contain a source address corresponding to the extracted source address, then a source context corresponding to the extracted source address is derived and stored in the address look-up table with the extracted source address.
If the retrieved destination address context represents a specific outgoing port, then the received packet is queued for outgoing transmission on that port. If the retrieved destination address context represents an aggregated grouping of outgoing ports, then the identifiers for the outgoing ports comprising that grouping are retrieved from the distribution table, and the received packet is queued for outgoing transmission on all of the outgoing ports comprising that grouping.
Advantageously, the source context corresponding to an extracted source address is derived by producing a hash key through application of a hash function to the extracted source address. The incoming port on which the packet containing the extracted source address was received is identified. If the identified incoming port is within an aggregated grouping of incoming ports, then a port identifier representative that aggregated grouping is derived. If the identified incoming port is not within an aggregated grouping of incoming ports, then a port identifier representative of the identified incoming port is derived. The hash key and the port identifier are then combined to form the source context corresponding to the extracted source address.
The hash function is preferably selected such that successive application of the hash function to all source and destination addresses expected to be seen by the Ethernet switch will produce a lowest value hash key, a highest value hash key, and a group of hash keys having intermediate values distributed evenly between the lowest and highest values.
The distribution table contains a separate port identifier look-up table for each aggregated grouping of outgoing ports. Advantageously, the hash key is an N bit hash key; and, each port identifier look-up table contains 2N entries occupying 2N consecutive locations, with each entry being an identifier of a particular one of the physical outgoing ports.
Identifiers for particular outgoing ports are retrieved from the distribution table by extracting first and second N bit hash keys which form part of the retrieved destination and source address contexts respectively. The hash keys are combined to form an N bit connection identifier. The port identifier look-up table corresponding to the aggregated grouping represented by the retrieved destination address is selected, and the entry at the table location corresponding to the value of the N bit connection identifier is retrieved.
If the address look-up table does not contain a destination address corresponding to the extracted destination address then first and second hash keys are produced by applying a hash function to the extracted source and destination addresses respectively. The hash keys are combined to form an N bit connection identifier. The incoming port on which the packet containing the extracted source address was received is identified. All of the aggregated groupings are scanned to identify all outgoing ports to which packets may be directed from the incoming port on which the packet was received. For each one of those outgoing ports, the port identifier look-up table corresponding to the aggregated grouping containing that outgoing port is selected, the entry at the table location corresponding to the value of the N bit connection identifier is retrieved, and the received packet is queued for outgoing transmission on the outgoing port corresponding to the retrieved entry.