1. Technical Field
The present invention pertains to the field of communications networks, in particular to the field of packet forwarding schemes in digital communications networks.
2. Descriptions of the Related Art
List of acronyms:
ADAdministrative DomainFEVForwarding Enable VectorFITForwarding Instruction TagIFUInterface UnitL1ISO OSI Stack Layer 1L2ISO OSI Stack Layer 2L3ISO OSI Stack Layer 3LERLabel Edge RouterLSRLabel Switch RouterLSEMPLS Label Stack EntryMPLSMulti-Protocol Label Switching, IETF RFC 3032ISPInternet Service ProviderIPInternet Protocol, IPv4: IETF RFC 791, IPv6: IETF FRC 2460PPPPoint-to-Point Protocol, IETF RFC 1661QoSQuality of ServiceTTLTime To Live
The purpose of packet-switching networks, such as the Internet, is to deliver data packets from the source node of a packet to a destination node of the packet, wherein the node means a host, a server, a switch or a router. To be delivered to its proper destination by a packet switching network, a packet needs to include a destination identifier. Out of the nodes addressed within and reachable by certain network domain, a packet may be destined to a particular single node, to a certain set of nodes, or one of a specified set of nodes. Thus the destination identifier of a packet should be considered as a forwarding instruction for the network domain to deliver the packet to a proper set of nodes reachable by the network domain. It further appears that such forwarding instructions of packet have significance only within the network domain interconnecting the set of nodes reachable by it, i.e. the forwarding instruction is said to be local to that network domain.
Typically, the finite number of external interfaces of network domain can be numbered, i.e. addressed with interface identification numbers, so that each interface of the network domain has a unique address i.e. identification number within that domain. As a basic example, assume a network interconnecting one hundred nodes, so that each of the one hundred nodes has a single and dedicated interface to the network. These node-specific interfaces can then be addressed with their related interfaces numbers, which could be e.g. the integers from 1 through 100 (inclusive). Thus, in the event that any of the one hundred nodes needs to send a packet to the node behind the network interface #75, the node specifies number 75 as the domain-scope destination identifier in the forwarding instruction included in the packet header. The network domain will then try to deliver that packet to the node associated with its interface #75. This type of a process of the network delivering a packet to a destination node based on a forwarding instruction is called routing the packet.
The above model of destination identifier-based packet routing (called also switching or forwarding) is generally quite efficient for unicasting, i.e. for delivering a data packet to a single destination specified by its destination identifier, and it is the basic model of the current packet-switched communications protocols, such as IP, FR, ATM or MPLS, all of which use an integer number to identify the network-domain-scope destination for each packet or cell. The unicast destination identifier based forwarding requires the packet-switching nodes to resolve the next-hop destination for each packet using route information databases such as routing, switching or forwarding tables, called collectively as switching-tables, which provide a mapping between the packet destination identifiers and their associated forwarding instructions, wherein a forwarding instruction includes an identification of the egress port (or equal) on which the switch should forward the packet. Naturally, such switching-tables need to be configured and maintained in order for the network to work properly, which process is known to be rather complex especially for networks with a large number of packet-switching nodes.
However, even with their increasing complexity, there are certain serious limitations with the current packet-switching techniques, particularly in the areas of multicasting, anycasting and traffic protection.
Conventionally a packet-switch, when receiving a packet (a ‘packet’ is used here to refer also a ‘cell’) on one of its ingress ports, looks up with the destination identifier of the packet from a network management software-configured switching-table the egress port and the egress link identifier configured for the packet. While this process requires configuring and maintaining a switching-table containing forwarding instructions per each destination identifier for each packet-switching node in the network, and even though the packet-switches thus become quite complex, this regular packet-switching method does not allow configuring more than a single egress port and a link identifier in the switching-tables per an ingress link without substantial additional complexity.
Thus, conventional packet-switching is not efficient when a portion of the packets would need to be forwarded to a group of more than one egress ports, or to any suitable egress port out of such defined multi- or anycast group. This in turn requires either multiplying the complexity of conventional packet-switching to support multicast and anycast, thereby limiting the scalability and reducing the cost-efficiency of the switching technology, or replicating a multicast packet multiple times to be send individually i.e. unicast it to each individual destination. For anycast type of traffic, e.g. in case that one out of a group of servers should be contacted, this unicast method typically can not dynamically select the least loaded i.e. currently best responding server, resulting in non-balanced server load patterns, and often poor client performance experience.
Traffic protection re-routing at the packet-level requires a packet to be forwarded at some point in the network between its source and destination nodes to a non-default ‘detour’ route to avoid an unexpected failure associated with the route it would normally use. With conventional packet switching, such protection re-routing protection involves software based reconfiguration of routing, switching and/or forwarding tables of the nodes in the network between the source and the destination of the packet, which causes non-deterministic and often intolerably long traffic protection restoration completion times, especially in the case of multiple route, switch and forwarding table entries that would need to be reconfigured simultaneously or over a short period of time. It appears that pre-computing a protection route and indicating both the regular and the protection route in the forwarding instructions of the packets, and using a packet-switch-interconnect network that delivers the packet along the appropriate route based on real-time route status, would be needed to provide deterministic, efficient and fast packet-level traffic protection. However, such features are not supported by conventional packet-switching technologies that are based on unicast model and software-configured switching and routing tables.
The fundamental difficulty in providing deterministic end-to-end QoS and optimized efficiency of network resource utilization is that packet traffic in service provider networks consists of a multitude of non-coordinateable individual variable-bandwidth traffic flows across the networks. Thus, unless traffic flows are rate controlled, network congestion can occur, resulting in packets getting delayed or lost before reaching their destinations, in which cases the packets must often be retransmitted, thus resulting in a single packet consuming network capacity (air-time) multiple times, thus further worsening the congestion. On the other hand, rate control defeats the original purpose and efficiency of packet-switching, i.e. to achieve higher data traffic throughput than with static circuit-switching, and in essence, plain standard circuit-switching could be used instead of rate-controlled packet-switching.
To accommodate variable-bit-rate packet traffic flows for constant-bit-rate L1 or L0 transmission, and in particular to provide congestion avoidance and specified QoS parameters, such as bursting tolerance e.g. for rate-controlled traffic, and thereby to reduce packet loss and retransmission rates, packet-switching nodes need to provide packet queuing capability. Packet queuing is conventionally implemented with electrical data storage elements, called buffers, which typically are implemented with RAMs. With rapidly increasing network interface data rate requirements, increasingly large date buffers are needed at packet-switches. Note that if the network system was able respond to a traffic burst, or link congestion or failure in one second (currently a non-realistic target), and that it should be able to buffer traffic for that response time to prevent packet loss, a 10 Gbps switch interface should be able to provide buffering for 10 Gb of data per each of its egress ports (that are subject to congestion). While 10 Gbps packet-switched network interfaces are in use as of writing this, the current maximum available RAM sizes are less than 1 Gb per chip. Furthermore, the maximum data throughput per a RAM chip currently is far below 10 Gbps, approximately at the 1 Gbps range. Thus there is a gap in the required switch interface data rate capacity and the feasible buffering capacity of the order of ten-to-one, which means that the conventional packet queuing techniques based on electrical data storage on RAMs is significantly limiting the maximum switch port data rates for which any type of QoS and congestion control can be provided. Additionally, as the largest available electrical data storage capacity can currently only be implemented using discrete off-chip RAM parts, the conventional packet queuing mechanisms result in complicated and costly switch hardware implementation.
However, at a properly engineered network, i.e. a network that it has an adequate amount of capacity to serve its access interfaces and that it has no single-point-of-global-failure, if a congestion occurs it is typically because a momentary demand for capacity on a certain route or link within the network, such as a server port, exceeds its physical capacity, while there at the same time are under-utilized alternative routes or links within the network. To utilize such alternative routes, that are under-utilized at the moment a packet-switching node makes a packet forwarding decision, the node would need to maintain a corresponding set of alternative next-hop destinations per a single packet forwarding identifier within its switching-table, and have real-time traffic load info for each of that set of alternative routes. These features, however, are not supported by the current unicast-oriented L3 routing or unicast and connection-oriented L2 switching techniques, which state of affairs thus is currently causing-sub-optimal utilization of network resources for dynamic packet traffic.
Furthermore, even if there was no alternative route to bypass a congested egress port of the network domain, in a properly engineered network, during the congestion on that particular overloaded link, there at the same time are under-utilized links, i.e. network fiber capacity having unused bandwidth. Thus, rather than trying to queue the packets destined for the congested link in electrical data buffers on packet switches, it would be more efficient to use the unused fiber bandwidth on non-congested network links as ‘optical’ buffering capacity. Obviously, a flexible and dynamic alternative routing capability, which is not supported by conventional packet switching techniques, would be necessary to utilize the unused network fiber bandwidth as optical data buffering capacity.
Also, it is worth to notice that most hops between L2 or L3 packet-switches when routing packets from their sources to their destinations are hops between packet switches administered by the same network operator, such as an Internet Service Provider, telecom carrier or a corporate network administrator. The packet-switches within the same network operator's constitute that operator's administrative domain (AD), which is delimited by border routers, such as Border Gateway Protocol (BGP-4) routers currently for IP, and Label Edge Routers (LERs) for MPLS, only through which nodes external traffic can be passed to or from that AD. Regarding the ADs, two points are worth to notice at this stage. First, when a packet arrives to a network operator's domain, the border router through which the packet arrives needs to be able to resolve to which one(s), if any, of the L3 border routers within the AD it should forward the packet to, and therefore there is no need for a single additional L3 packet switch node within the AD in addition to the border routers. Secondly, the domain-internal interfaces of the L3 border routers within the operator's AD can be addressed with completely independent interface identifiers by the administrator of the network domain.
Based on the above two points, it appears that the most straightforward way to perform packet forwarding within an AD would be to use simple connectionless packet switching network, which can be instructed by the border routers using simple, AD-local, packet forwarding instructions, to deliver each packet properly among the border routers of the AD. Such AD-local packet forwarding instruction, called a packet forwarding instruction tag (FIT), could be significantly simpler, yet more flexible for AD-local forwarding, than the currently used forwarding identifiers, such as ATM, MPLS or IP headers, as such AD-local FIT would only need to identify to which ones of the limited number of border gateways of the AD the packet should be delivered. It should be noted that a L2 packet switching network, instead of a static L1 circuit switching network, is preferred for interconnecting the border routers, since a regular L1 circuit switching network with its constant-bit-rate connections of coarse bandwidth granularity, is inefficient for delivery of variable bandwidth packet traffic.
Another common application for L2 switching, besides that of implementing the core of an AD that passed traffic between the border routers of the AD as discussed above, is that of passing packet traffic passing traffic between different ADs, i.e. between borders routers of different ADs. Such network systems over which Internet traffic is being passed between different communications service providers domains are called in the industry as Internet Exchange (IX) facilities or carrier-neutral peering points. In such applications, a semi-permanent L2 addressing and switching system is used to provide a controlled and neutral exchange of traffic between the border routers of different ADs. Again, a common packet switched network system is used to avoid having to build a mesh of dedicated L1 circuits interconnecting each pair of ADs that need to exchange traffic; the L2 switched exchange facility allows an AD to exchange all of its traffic with each other ADs present at the IX using a single share L1 port. Most importantly, the IX facility needs to deliver each packet to exactly that or those of the AD border routers connected by the exchange facility as instructed by the AD border router which passed the packet to the exchange facility. Thus, again, an optimal packet switching network system would deliver the packets between the border routers of the different ADs as indicated the FITs of each packet. Thereby it appears that the same type of a simple FIT-based packet-switching network or switch is optimal for both interconnecting the borders routers of a single AD and for interconnecting the borders routers of different ADs.
Based on the above discussion, there is a need for a new packet-forwarding method, such that efficiently supports multicasting and anycasting with, in addition to unicasting, and that provides dynamic load balancing and reliable and efficient packet-level traffic protection. Such a new packet-forwarding method should further efficiently support packet-forwarding with very high data rate network interfaces, and simplify the network management.