Information networks are well known in the art and function to transmit information such as computer data between various computer systems operably coupled to the information network.
One example of a packet-switched network is defined by the IEEE 802 standards, including the set of standards within IEEE 802 commonly known as Ethernet. These standards have found widespread acceptability and many networks conform to these standards.
Packet switched networks are distinguished from other multiplexing techniques in that each packet header is inspected to determine where to forward the packet to in order to transmit the packet closer to its final destination.
A second example is a purely circuit-switched network which operates by creating, maintaining and transmitting data over a circuit between two network nodes. Circuit switched networks may use Time Division Multiplexing (TDM) in which case such a circuit has a fixed bandwidth which poses many disadvantages.
Packet networks make use of data plane protocols which constitute an agreement among parties regarding the encapsulation or modulation of information. At the lowest physical layer protocols define the modulation or electrical or optical signals. At slightly higher layer protocols layers define bit patterns used to identify the beginning and end of packets. At this layer and at higher layers protocols encode information related to the delivery of information across highly complex networks.
Communication networks whether communicating between computers within a single building, or communicating between two metropolitan areas, e.g., San Francisco and New York are formed by a plurality of interconnected network elements. The network elements and interconnection between elements are commonly referred to using a slight variation on graph theory terminology. Network elements are referred to as “nodes”. Interconnections between network elements are referred to as “links”. In the mathematical discipline of graph theory the term “edge” is used where in information network the term “link” is used.
In information networking the term “edge” is used to indicate part of a network immediately adjacent to one or more “end systems”, where the “end system” transmits and receives packets for their own use but do not forward packets for the benefit of other nodes in the network. In many modern networks all nodes both transmit packets and receive packets that are used for their own purpose. The term end system indicates that the sole purpose of a given node or set of nodes in a network is to use the services of the network rather than provide services. For example, the primary purpose of the core of a network is to forward large volumes of traffic for the benefit of other nodes. The primary purpose of the edge of a network is to deliver traffic to end systems. End systems only source and sink traffic.
Date plane protocols are used to facilitate the delivery of data from one computer or end system in a network to another. Date plane protocols generally place information immediately preceding the data to be delivered. The data to be delivered is known as the payload. The information placed in front of the payload is known as the packet header. The packet header generally carries information regarding where and how to deliver the packet. The payload may be followed by other information defined by the protocol, such as a frame check sequence to insure the integrity of the header and payload. The entire packet definition dictated by a particular protocol is known as that protocol's encapsulation.
A packet may be encapsulated by a computer transmitting the packet into a large network with information about the final delivery of the packet. A series of related packets sent between two end systems is a type of traffic flow known in IETF terminology as a “microflow” and in IEEE 802.1-AX terminology as a “conversation”.
One type of network is referred to as a “packet network”. A key requirement of a packet network is to deliver information from one computer or end system in the network, to another as directed by a specific protocol. Modern networks carry millions, if not billions of individual microflows at any given time, where the microflows are tiny in capacity relative to the capacity of the network and are extremely short lived.
Within the core of a communications network it is useful to forward large traffic aggregates rather than forward individual microflows. The Internet Protocol (IP) for example, supports this directly in its method of address allocation. A full IP address is 32 bits in IP Version 4 (IPv4) and 128 bits in IP Version 6 (IPv6). A set of higher order bits can be used to forward a traffic aggregate. For example, a trading station in the San Francisco financial district may exchange packets with a server operated by a stock exchange in the New York financial district. The full IP addresses identify the end systems. A smaller number of bits in the address may identify the address as falling within the New York metropolitan region. Additional bits used within the New York metropolitan region only might identify the destination as belonging to a particular stock exchange on Wall Street. Once delivered to the exchange, the full address can then be used to reach the specific server. This form of addressing is defined in the IETF as Classless Interdomain Routing (CIDR).
Some protocols make use of further encapsulations when aggregating traffic. Multiprotocol Label Switching (MPLS) is one such protocol. Ethernet Provider Bridging is another such protocol. For example, within the network in the San Francisco Bay area, a node may further encapsulate all traffic destined to the New York metropolitan area with an MPLS header, which in MPLS is called a label stack, or if the packet is already encapsulated as MPLS, add one or more label stack entries.
In many protocols further encapsulations can be added in order to form larger traffic aggregates. Forming larger traffic aggregates reduces the amount of control information exchanged and reduces the number of forwarding entries required deep in the core of a network. Each encapsulation is referred to as a layer of encapsulation. In some circles additional MPLS label stack entries are referred to as sub-layers, but the sub-layer terminology will not be used herein.
The outside encapsulations are transmitted first. In MPLS the outer encapsulation is also referred to as the top label stack entry or top of the label stack. Inner MPLS encapsulations are referred to as lower label stack entries and are referred to as residing below the upper label stack entries. This use of “upper” and “lower” in describing label stack entries conflicts with the use of “upper” and “lower” in describing more general layering.
In many cases more than one link may interconnect a pair of nodes. In other cases, more than one indirect path at a lower layer may be available between a pair of nodes involving one or more intermediate nodes. In many cases it is desirable to spread traffic over one or more direct links, or one or more lower layer paths when forwarding large traffic aggregates across a network.
A number of techniques involve spreading the traffic flows across multiple links or multiple lower layer paths. Collectively these solutions are called multipath techniques. A set of individual links or individual lower layer paths over which a multipath technique operates is called a multipath. Each of the individual links or individual lower layer paths is called a component of the multipath. A term which is roughly synonymous with multipath is composite link, however the two are not quite equivalent.
A common and well documented set of techniques use a hash function applied over information in packet headers as a basis for distributing traffic across the set of links in a multipath. These techniques commonly search for the innermost encapsulation which can practically be identified, such that the largest number of generally small flows or microflows can provide input to the hash, thereby providing a greater probability of an even distribution of traffic. Some multipath techniques support making adjustments to correct slight imbalance in traffic among the component links or lower layer paths. Using information at the innermost encapsulation where the least amount of traffic aggregation has occurred allows a very fine granularity to make adjustments in load balance for those techniques that support this form of adjustment.
MPLS-TP is a restricted subset of MPLS intended to provide capabilities and management that is more similar to transport network operators who are likely to be familiar with the operation of legacy TDM networks. MPLS-TP has placed new requirements on the underlying server layer. Among these requirements are that traffic within an MPLS-TP traffic flow cannot be reordered. This requirement is in conflict with the behavior of existing multipath techniques.
Existing multipath techniques include but are not limited to the following three examples.
1. ECMP—Equal cost multipath (ECMP) has been applied to IP networks since the 1980s. ECMP is defined for the IETF OSPF protocol and for the ISIS protocol, among others.
2. Ethernet Link Aggregation—The IEEE has defined 802.1AX 2010. This is a form of multipath to be applied exclusively to Ethernet.
3. MPLS Link Bundling refers to an MPLS technique which allows multiple links or lower layer paths between a pair of MPLS label switched routers to be announced in a link state routing protocol as a single Label Switched Router forwarding adjacency (link). Any one link or lower layer path in a link bundle is referred to as a component of the link bundle or more briefly as a component link. An LSP may be placed on a single component or may be spread out over multiple components. When traffic is spread out over multiple components, control plane signaling and management protocols report that the “all ones” component is used, indicated by a binary component number containing all ones (a near impossibly large component number).
Within any of these multipath techniques, the traffic across a multipath need not be evenly distributed. For example, an Ethernet Link Aggregation Group (LAG) may have some members (component links) of one capacity (10 Gb/s for example) and some members of another capacity (40 Gb/s or 100 Gb/s for example). In the case of link bundling, the component links may be other MPLS LSP, whose capacity is expressed as a real number in bytes per second.
A method and apparatus which simultaneously meets the following two requirements would be beneficial to the information network, in particular to large information networks.
1. The method and apparatus should be capable of transporting packets conforming to requirements to avoid packet reordering among traffic aggregates contained within larger traffic aggregates, specifically but not limited to MPLS-TP traffic aggregates within larger MPLS traffic aggregates.
2. The method and apparatus should be able to take advantage of multipath techniques.
It is to such a method and apparatus that the inventive concept disclosed herein is directed.