Computer networks comprise a plurality of interconnected networking devices, such as routers, switches and/or computers. The physical connection that allows one networking device to communicate with another networking device is referred to as a link. Links may utilize wired or wireless communication technologies. Data may be communicated between networking devices via the link in groups of binary bits referred to as packets. The rate at which networking devices may communicate data via a link is referred to as link speed.
The task of achieving increasing link speeds is one of the challenges in computer networking technology. Higher link speeds correspond to higher bandwidth and correspondingly higher data rates. In pursing the goal of higher data rates, network architects may face a number of constraints. Such higher, or “cutting edge”, data rates often require components, such as integrated circuit (IC) devices, and interconnect, such as category 6 (Cat6) or Cat7 cabling, which are more expensive than equivalent, more commonly used hardware that may not be capable of achieving the higher data rates. Thus, economic considerations potentially represent one such constraint. Various limitations dictate a speed at which components may operate, and a speed at which data may be transferred by the components and/or via interconnect.
As link speeds increase, one operational objective of networking designers may be to incrementally control the bandwidth associated with a link so that bandwidth may be deployed within the network “on demand”. The ability to adjust link bandwidth on demand is referred to as scalability. An objective of scalability is to enable adjustment of link bandwidth dynamically under software control, such as from an operations administration and maintenance (OAM) monitoring terminal.
Networks, which utilize cutting edge technologies, which enable the higher data rates are often used to transport data that have considerable value to the users of the networks, for example for exchange of financial data, or for exchange of large volumes of data between very expensive supercomputer systems. Thus, another potential operational objective is to have the ability to gradually decrease the link bandwidth in the presence of impairments that may occur on the link. This ability to operate the link at a reduced bandwidth, rather than to lose use of the link altogether, is referred to as resiliency.
One approach to overcoming some of the limitations described above is to create a logical high bandwidth link by simultaneously transmitting the data via a plurality of lower bandwidth physical links. This method is often referred to as aggregation. Aggregation creates associations between the logical physical link and a group of physical links. In theory, aggregation enables scalability by increasing logical link bandwidth by increasing the number of associated physical links. Thus, if each physical link has a bandwidth of 10 gigabits/second (Gb), a higher speed logical link may be created by transmitting data via two 10 Gb physical links. In theory, the bandwidth of this aggregated logical link would be 20 Gb.
Aggregation also enables resiliency through gradual decreasing of logical link bandwidth by decreasing the number of associated physical links. For example, a physical link, which experiences a failure, may be removed from association with the aggregated logical link while the remaining physical links maintain association with the logical link. Thus, in a logical link associated with two 10 Gb physical links, the logical link bandwidth may be gradually decreased by removing one of the physical links in the association.
Link aggregation is one existing method for aggregation. A specification for link aggregation may be found in IEEE standard 802.3ad (IEEE 802.3ad). The IEEE 802.3ad standard specifies a method for defining a single logical link by aggregating individual physical links, each operating at data rates of 10 megabits/second (Mb) to 100 Mb, for example. However, the IEEE 802.3ad link aggregation may impose limitations on how data may be distributed among the physical links associated with a single logical link. This limitation may be related to the method in which network devices communicate within the network.
When a network device A_ and network device B_ establish a communication for the purpose of exchanging data, the communication may be referred to as a “conversation”. The conversation may be identified by a conversation identifier, for example, network device A_ may create a conversation identifier cid=1 for the conversation with B_. The conversation identifier enables network device A_ to establish and concurrently maintain other conversations in addition to the conversation cid=1 by establish distinct conversation identifiers for each conversation (for example cid=2, cid=3, . . . ).
One limitation in IEEE 802.3ad link aggregation is that packets containing data transmitted from a source network device (for example A_) to a destination network device (for example B_), associated with a given conversation (for example cid=1), may be required to be transmitted via a single physical link, PL1, even though the logical link, LL1, may comprise a group of physical links, PL1, PL2, PL3 and PL4, for example. In this case, the network device A_ may maintain 4 concurrent conversations (for example cid=1, cid=2, cid=3 and cid=4) with the network device B_, wherein in data associated with the conversation cid=1 may be communicated via link PL1, data associated with the conversation cid=2 may be communicated via the link PL2, data associated with the conversation cid=3 may be communicated via link PL3 and data associated with the conversation cid=4 may be communicated via link PL4. However, if the network devices A_ and B_ are engaged in a single conversation, for example the conversation cid=1, the data transfer rate between the network devices A_ and B_ may be limited by the bandwidth of link PL1. Thus, while the theoretical bandwidth of the link LL1 may be equal to the sum of bandwidths of the links PL1, PL2, PL3 and PL4, the IEEE 802.3ad link aggregation may limit the bandwidth available for a single conversation to the bandwidth of an individual link PL1, PL2, PL3 or PL4.
One potential reason for this limitation is that the IEEE 802.3ad standard may impose temporal ordering restrictions on packets transferred via the logical link. These temporal ordering restrictions mean that if the network device A_ transmits data in a sequence of packets P1, followed by P2, followed by P3 and followed by P4 during the course of a conversation, the packets must be received at the network device B_ in the order P1, followed by P2, followed by P3 and followed by P4. If the packets are transmitted via the same link PL1, temporal ordering may be preserved for packets transmitted from network device A_ to network device B_. However, if the packets are distributed among the links, for example packet P1 transmitted via link PL1, packet P2 transmitted via the link PL2, packet the P3 transmitted via the link PL3 and the packet P4 transmitted via the link PL4, the temporal ordering cannot be guaranteed. For example, the network device B_ may receive the packets in the following order: P1 via PL1, P3 via PL3, P2 via PL2 and P4 via PL4. Receipt of the packets in the before mentioned order may violate the temporal ordering restrictions which may be required for IEEE 802.3ad link aggregation.
Thus, IEEE 802.3ad link aggregation may not exhibit the property of scalability since adding additional physical links may not result in a linear increase of logical link bandwidth due to temporal ordering restrictions.
Within the protocol reference model (PRM) specified by the international organization for standardization (ISO), the IEEE 802.3ad link aggregation may be represented as a software entity, which is located in the data link layer (DLL) of the PRM. Expanding further upon the DLL, the IEEE 802.3ad software entity may be represented as a link aggregation sublayer. Relative to the link aggregation sublayer, the next higher layer protocol entity within the DLL may be a medium access control (MAC) client. Exemplary MAC clients may comprise a bridge relay entity, or a logical link control (LLC) layer entity. Also relative to the link aggregation sublayer, the next lower layer protocol entity within the DLL may be one or more instances of a MAC sublayer. There may be a single instance of the MAC sublayer for each distinct physical link within the networking device.
The link aggregation sublayer may comprise functionalities individually referred to as distributor and collector. The distributor function may operate within a source network device while the collector function may operate within a destination network device. The distributor may receive packets from the MAC client, select a MAC sublayer entity from among a group of MAC sublayer entities associated with a logical link, and send the packet to the selected MAC sublayer entity. The MAC sublayer entity may then cause the packet to be transmitted via the associated physical link to the destination network device. The IEEE 802.3ad standard may require that the distributor send packets associated with a given conversation to a specific MAC sublayer entity due to temporal ordering restrictions.
The collector function may receive packets from a MAC sublayer entity, Ml, among a group of MAC sublayer entities. The collector may send the packet to the MAC client. Each received packet may be associated with a conversation identifier, for example cid=1. Temporal ordering restrictions may require that each packet containing cid=1 be received at the collector via the MAC sublayer entity Ml.
Physical medium entity (PME) aggregation is another existing method for aggregation. A specification for PME aggregation may be found in IEEE standard 802.3ah (IEEE 802.3ah). The IEEE 802.3ah standard specifies a method for defining a single logical link by aggregating individual digital subscriber line (DSL) interface links. The IEEE 802.3ah standard may specify two interface links, which may be utilized for PME aggregation: 10PASS-TS, a 10 Mb interface and 2BASE-TL, a 2.5 Mb interface. For example, PME aggregation may enable defining a single logical link with a theoretical aggregate link bandwidth of 10 Mb by forming an association of four 2BASE-TL interface links.
Referring to the ISO PRM, the IEEE 802.3ah PME aggregation may be represented as a hardware entity, which is located in the physical (PHY) layer. Expanding further upon the PHY layer, the IEEE 802.3ah hardware entity may be represented as a PME aggregation sublayer. The PME aggregation sublayer may be located within the physical coding sublayer (PCS). Within the PCS, the next higher layer protocol entity relative to the PME aggregation sublayer may be the MAC-PHY rate matching sublayer. Relative to the PCS, the next higher layer protocol entity may be the MAC sublayer, and the next lower layer protocol entity may be the PME layer. Relative to the MAC sublayer, the next higher layer protocol entity may be the MAC client. Each instance of a PME may correspond to a physical interface link located within a networking device.
The PME aggregation sublayer, within the PCS sublayer, may interface with a plurality of PMEs. A single PME aggregation sublayer instance may correspond to a single logical link while the plurality of PMEs, which interface to the PME aggregation sublayer instance, may correspond to the interface links that are associated with the logical link. In this regard, the PME aggregation sublayer may enable a logical link, LL2, to comprise a group of interface links DL1, DL2, DL3 and DL4, for example. The theoretical bandwidth of the link LL2 may be equal to the sum of bandwidths of interface links DL1, DL2, DL3 and DL4.
Within a source networking device A_, the PME aggregation sublayer instance may enable data associated with a conversation between the networking devices A_ and B_ to be distributed among the group of interface links associated with a single logical link. The PME aggregation sublayer instance may receive packets sent from the MAC sublayer. The packets may contain data associated with the conversation between networking devices A_ and B_, for example. The PME aggregation sublayer may divide each packet into a plurality of fragments, and distribute the fragments among the PME layer entities such that the plurality of fragments is transmitted via a plurality of interface links selected from the group of interface links associated with the single logical link. For example, a packet may be divided into fragments F1, F2, F3 and F4, respectively, where F1 represents the first portion of the packet after removal of the preamble, F2 the second portion, F3 the third portion and F4 represents the last portion of the packet. A logical link LL2 may be formed by an association among the interface links DL1, DL2, DL3 and DL4. The corresponding PME entities may be PM1, PM2, PM3 and PM4, respectively. The PME aggregation sublayer may send fragment F1 to PM1, F2 to PM2, F3 to PM3 and F4 to PM4. Correspondingly, PM1 may send the fragment F1 via interface link DL1, PM2 may send the fragment F2 via DL2, PM3 may send F3 via DL3 and PM4 may send F4 via DL4.
Within the destination networking device B_, the PME aggregation sublayer instance may enable reception of the fragments F1, F2, F3 and F4 from a plurality of PMEs. The PME aggregation sublayer instance within B_ may not place restrictions on the order in which each of the fragments is received. The PME aggregation sublayer instance may rearrange the fragments in order F1, F2, F3 and F4 and assemble a received packet. A completed packet may be assembled by appending a preamble field to the assembled received packet. The preamble field appended by the PME aggregation sublayer instance within the destination networking device B_ may comprise a determined binary value, for example 10101010. The appended preamble field may not comprise the same binary value as did the preamble field removed by the PME aggregation sublayer instance within the source networking device A_. Thus, the source networking device A_ may not be able to utilize the preamble field to communicate information, such as OAM, to the destination networking device B_. The PME aggregation sublayer instance may send the completed packet to the MAC sublayer.
The PME aggregation sublayer instance directly receives packets from the MAC-PHY rate matching sublayer. An inter packet gap (IPG) is inserted between packets such that after the last bit from a current packet is received, a time delay as defined by the IPG will elapse before the PME aggregation sublayer instance receives the first bit from the next packet. The first portion of the packet may contain a preamble field. The preamble field may have a fixed length as measured in octets, for example 8 octets. The preamble field may be utilized for synchronization, or for other purposes, such as to communicate OAM information between communicating network devices.
At the source networking device, the PME aggregation sublayer instance removes the preamble field and copies a first portion of the packet following the preamble field as a first fragment payload, FP1. The fragment payload may have a length specified from within a range of values from 64 octets to 512 octets. The PME aggregation sublayer instance may append a first fragment header FH1 to the fragment payload FP1. The fragment header may have a specified length, for example 2 octets. A frame check sequence (FCS) may be computed and appended to FP1 as a first FCS, FCS1. The collection of fields, FH1, FP1 and FCS1 form a first fragment F1. The FCS is utilized at the destination networking device to enable detection and/or correction of bit errors in a received fragment.
The PME aggregation sublayer instance copies a second portion of the packet following the portion copied for FP1. The second portion becomes a second fragment payload, FP2. A second fragment F2 is generated by appending a fragment header FH2, and frame check sequence FCS2 to FP2. The PME aggregation sublayer may continue copying subsequent portions of the packet until the last portion has been copied. After the last portion of the packet, FPN, has been copied a last fragment, FN, may be generated. At this point, the packet has been fragmented.
The frame header field for each fragment, FH, contains a sequence number, SN, a start of packet (SOP) field and an end of packet (EOP) field. The SN field may be 14 bits in length, for example. The value contained in the SN field may be incremented for each subsequent fragment that is generated within the PME aggregation sublayer. For example, SN=1 for F1, SN=2 for F2, . . . . The SN field enables the PME aggregation sublayer instance within the destination networking device to identify an order in which fragments were sent by the source networking device. The field SOP=1 for the first fragment generated from a received packet, for example F1. For fragments other than the first fragment, SOP=0. The field EOP=1 for the last fragment generated from a received packet, for example FN. For fragments other than the last fragment, EOP=0.
The SOP and EOP field enable the PME aggregation sublayer instance within the destination networking device to identify which block of fragments are associated with the same packet. For example, when the destination PME aggregation sublayer instance receives a fragment for which the fragment header fields are: SN=i, SOF=1 and EOF=0, a first fragment may be identified. Thus, a fragment for which the fragment header fields are: SN=i+1, SOF=0 and EOF=0 identifies a second fragment with at least one additional fragment to follow. A fragment for which the fragment header fields are: SN=i+2, SOF=0 and EOF=1 identifies a third and last fragment associated with a packet.
The IEEE 802.3ad link aggregation defines a link aggregation sublayer that is located within the DLL in the ISO PRM. The IEEE 802.3ad link aggregation may impose temporal ordering restrictions, which may limit scalability properties of link aggregation. The IEEE 802.3ah PME aggregation defines a PME aggregation sublayer, which is located within the PHY layer in the ISO PRM. Relative to the PME aggregation sublayer, the next higher and next lower protocol layer entities may each be located within the PHY layer.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.