The present invention is directed to communications networking. It is directed particularly to the routing of ICMP messages in tag-switching networks.
Two local area networks, LAN A 10 and LAN B 20, interconnected through a xe2x80x9cbackbonexe2x80x9d of routers 2, 4, 6, 8 are shown in FIG. 1. A router may have a plurality of interfaces to one or more local networks or to other routers. LAN A includes a router 2 and three host devices 14, 16, 18 which can communicate directly with each other over LAN A bus 12, and LAN B includes a router 8 and three host devices 24, 26, 28 which can communicate directly with each other over LAN B bus 22. The exchange of data between a LAN A device, e.g. HOST A114, and a LAN B device, e.g. HOST B124, is typically accomplished using an Internet Protocol (IP) datagram. The IP datagram is forwarded in the payload field of link-layer, e.g. Ethernet, communications packets that are exchanged between the backbone routers. The use of an IP datagram allows for the routing of data between network devices that do not have a link-layer connection and, therefore, cannot exchange link-layer packets with each other.
An Ethernet packet 200 having an IP datagram in its payload field 206 is shown in FIG. 2. The IP datagram is encapsulated between an Ethernet header field 202 and a trailing CRC field 204. The Ethernet header field 202 includes a type field 203 that specifies that the payload field 206 contains an IP datagram. The IP datagram includes an IP payload field 208 preceded by an IP header field 210. The IP header field 210 is comprised of a source IP address field 212 (containing IP address xe2x80x9cXxe2x80x9d), a destination IP address field 214 (containing IP address xe2x80x9cYxe2x80x9d), and a protocol field 215. The source address field 212 identifies the originator of the IP datagram, e.g. HOST A114, and the destination address field 214 identifies the intended recipient of the IP datagram, e.g. HOST B124.
A backbone router typically determines the link over which the IP datagram is to be forwarded by referring to a forwarding table, which contains routing information maintained by the router. Using the xe2x80x9cYxe2x80x9d address in the destination IP address field 214, the router performs a longest match search against IP addresses stored in the table. Unfortunately, because the IP address space is so large, the forwarding table may have to very large. More importantly, a longest match search through the forwarding table can be time consuming and result in the expenditure of valuable router processing resources and a slowing of the movement of packets through the network.
A technique known variously as xe2x80x9ctag-switchingxe2x80x9d or xe2x80x9clabel-switchingxe2x80x9d is one way of avoiding the longest match searches. Although packets forwarded by a tag-switching router contain a destination IP address, each packet also includes a stack of one or more xe2x80x9ctags,xe2x80x9d or xe2x80x9clabels,xe2x80x9d employed for forwarding. Although the invention to be described below is not limited to any particular implementation of tag switching, one popular method for implementing it is called Multi-Protocol Label Switching (MPLS) as described in commonly assigned co-pending U.S. patent application Ser. No. 08/997,343, filed Dec. 23, 1997, by Rekhter et al. for Peer-Model Support for Virtual Private Networks with Potentially Overlapping Addresses, and is hereby incorporated in its entirety by reference. When a tag-switching router receives a tagged packet, it uses the top tag in the tag stack to identify an entry in its forwarding table that specifies the next link of the route to the packet""s destination. In addition to the forwarding link, the entry typically includes a replacement tag. The receiving tag-switching router replaces the top tag in the stack with the replacement tag before forwarding the IP datagram over the next link.
FIG. 3 illustrates the exchange of an IP datagram over one type of tag-switching network. The tag-switching network is comprised of a first tag-switching edge router PE1 interfacing to a first customer edge router CE1 of a first local network; two tag-switching transit routers P1, P2 connecting the tag-switching edge router PE1 to a second tag-switching edge router PE2; and tag-switching edge router PE2 interfacing to a second customer edge router CE2 of a second local network.
We assume that customer router CE2 sends tag-switching edge router PE2 a Ethernet packet of the type depicted in the second row of FIG. 1 and without a tag stack of the type now to be described. Edge router PE2 prepends such a tag stack before it forwards the packet to transit router P2. Specifically, an Ethernet packet 400 containing a tagged IP datagram and forwarded from edge router PE2 to transit router P2 is shown in FIG. 4. As described above, the Ethernet packet 400 contains a payload field 406 that is encapsulated between the Ethernet header field 402 and a trailing CRC field 404. The Ethernet header field 402 includes a type field 403 that specifies that the payload field 406 contains an MPLS protocol data unit, such as a tagged IP datagram. The payload field 406 holds an IP datagram comprised of an IP payload field 408 preceded by an IP header field 410. The IP header field 410, shown in detail in the first row, includes a source IP address field 412 (containing IP address xe2x80x9cXxe2x80x9d), a destination IP address field 414 (containing IP address xe2x80x9cYxe2x80x9d), an identification field 416, and a fragment offset field 418. In this case, however, the IP payload field 406 is prepended with a tag stack field 420 that contains a top tag stack entry 422 and a bottom tag stack entry 432. Each tag stack entry 422, 432 includes a tag field 424, 434 pointing to an entry in the forwarding table, a xe2x80x9cclass of servicexe2x80x9d (COS) field 426, 436, an xe2x80x9cend-of-stackxe2x80x9d (S) field 428, 438 set to xe2x80x9conexe2x80x9d in the bottom tag stack entry 432, and a xe2x80x9ctime-to-livexe2x80x9d (TTL) field 430, 440 to be described below. For simplicity, only the destination IP address field 414 (containing IP address xe2x80x9cD1xe2x80x9d) and the IP payload field 408 (containing xe2x80x9cDATAxe2x80x9d) of the IP datagram are shown in FIG. 3.
Although the formats described in FIGS. 2 and 4 are typical formats for packets exchanged between tag-switching routers, they are not the only formats that such routers may employ. The formats employed on some xe2x80x9cEthernetxe2x80x9d links are actually somewhat more complicated than the format depicted here. Moreover, routers that communicate with each other over a point-to-point link, i.e., not by way of a shared medium, typically would employ a link-level protocol, such as SLIP or PPP, that is different from the Ethernet protocol just described. An implementation that is particularly desirable for highcapacity links employs Asynchronous Transfer Mode (xe2x80x9cATMxe2x80x9d) switches.
An ATM frame 500 having an IP datagram in its payload field 507 is shown in FIG. 5. The IP datagram field 506 and a tag stack field 520 of the payload field 507 are similar to the IP datagram field 406 and tag stack field 420 encapsulated by the Ethernet header 402 and trailer 404 of FIG. 4. The only difference is that the top tag field 524 of the top tag stack entry 522 contains question marks, which indicate that the top tag""s contents do not matter.
The reason why the top tag""s contents do not matter is that the routing decisions, which are based on those contents when the tag-switching router is implemented as a conventional IP router, are instead based on an ATM VPI/VCI field 546 found in the cell header field 544 of an ATM xe2x80x9ccellxe2x80x9d 540 when the tag-switching router is implemented as an ATM switch. From the point of view of an ATM client, the ATM frame 500 is the basic unit of transmission, and it can vary in length to as much as 64 Kbytes of payload. (Those skilled in the art will recognize that there are also other possible ATM frame formats, but FIG. 5""s third row depicts one, known as xe2x80x9cAAL5,xe2x80x9d that would typically be employed for user data.) From the ATM switch""s point of view, though, the basic transmission units are fixed-size cells into which the frames are divided. The cell header field 544, shown in detail in the first row, also includes a PTI field 548. One purpose of the PTI field 548 is to indicate whether its cell is the last one in a frame. If it is, its last eight bytes form the frame trailer field 504. Among other things, the trailer field 504 indicates how much of the preceding cell""s payload field 542 is comprised of actual payload, as opposed to padding used to complete a fixed-size cell.
The VPI/VCI field 546 is of particular interest to the present discussion. As is well known to those skilled in the art, ATM systems organize their routes into xe2x80x9cvirtual channels,xe2x80x9d which may from time to time be grouped into xe2x80x9cvirtual paths.xe2x80x9d Each switch associates a local virtual path/virtual channel indicator (VPI/VCI) with a channel or path that runs through it. When an ATM switch receives a cell, it consults the cell""s VPI/VCI field 546 to identify by table lookup the interface through which to forward the cell. It also replaces that field""s contents with a value indicated by the table as being the next switch""s code for that path or channel, and it sends the resultant cell to the next switch. In other words, the function performed by the VPI/VCI field 546 enables it to serve as the tag stack""s top tag. This is why a tag-switching router implemented as an ATM switch can ignore the top tag field 524, on which other implementations rely.
When tag-switching edge router PE2 receives an IP datagram from customer edge router CE2, it prefixes a first tag T3 that identifies an entry in the forwarding table of the destination tag-switching edge router PE1. The edge router PE2 then prefixes a second, or top, tag T2 that identifies an entry in the forwarding table of the next router, i.e., the first transit router P2, in the backbone path. When the transit router P2 receives the IP datagram, it uses the top tag T2 to identify the location in its forwarding table that specifies the forwarding link and a replacement tag T1 for the route to the edge router PE1; i.e., the transit router P2 does not have to perform a time-consuming longest-match search. It then replaces the top tag T2 with the replacement tag T1 that identifies an entry in the forwarding table of the second transit router P1 in the backbone path and forwards the IP datagram. (We assume that, as in the typical case, there are several transit routers in the backbone path, although in some configurations there may be none and only a single tag will be prefixed. All transit routers, except the last transit router in the backbone path, perform in a manner similar to that of transit router P2.) When the second transit router P1, which is also the last transit router in the backbone path, receives the IP datagram, it strips the top tag T1 and uses it to identify an entry in its forwarding table specifying the forwarding link and then forwards the IP datagram without replacing tag T1. This xe2x80x9cexposesxe2x80x9d tag T3. When the edge router PE1 receives the IP datagram, it strips the top tag, first tag T3, and uses it to identify an entry in its forwarding table specifying the forwarding link. It then transmits the data packet to the destination customer edge router CE1 over the forwarding link.
Note that this arrangement, in which the transit routers forward data packets in accordance with entries for the route to the edge router PE1 rather than to the alternate destination represented by destination IP address D1, relieves the transit routers of the need to maintain forwarding entries for routers outside the tag-switching backbone. In addition to improving network performance and reducing the router processing burden, a tag-switching network is also ideally suited for the implementation of a virtual private network (VPN) wherein two or more private local networks are securely connected over a public network. A VPN may be utilized by a geographically dispersed enterprise to connect its local area networks and thereby avoid the high cost of leased telephone lines.
The above discussion refers to a service provider""s router as an xe2x80x9cedge routerxe2x80x9d if it communicates with a customer""s router directly, i.e., without any intermediate service-provider router. Routers PE1 and PE2 are examples. The service-provider backbone routers that interconnect two backbone edge routers are called xe2x80x9ctransitxe2x80x9d routers, e.g. P1 and P2. Note that the terms xe2x80x9cedge routerxe2x80x9d and xe2x80x9ctransit routerxe2x80x9d have meaning only by reference to a given route. Although the drawing shows only a single route through the service provider domain, there are typically a very large number. For some of these routes PE1 and/or PE2 may serve as transit routers, and P1 and/or P2 may serve as edge routers. Accordingly, a backbone router may be a transit router in one VPN and an edge router in a second VPN.
It is often the case that customer devices on the VPN are identified by IP addresses that are not globally unique. In fact, the IP addresses in one VPN may overlap with addresses used in other virtual private networks supported by the service provider. As described in detail in U.S. patent application Ser. No. 08/997,343, filed Dec. 23, 1997, by Rekhter et al. for Peer-Model Support for Virtual Private Networks with Potentially Overlapping Addresses, non-globally unique IP addresses are allowed in a VPN because the backbone routers rely on the tags, and not the IP addresses, when forwarding tagged IP datagrams.
However, the use of non-globally unique IP addresses, together with the absence of exterior routes in the transit router forwarding table, may cause two different problems to arise during tagged IP datagram transfers across the VPN backbone. The first problem concerns the xe2x80x9ctime-to-livexe2x80x9d (TTL) field that is usually included in data packets transmitted on a public network. TTL fields are employed to prevent data packets from endlessly circulating through and clogging the public network. The TTL field of a IP datagram is initially filled with a predetermined number. Each time the IP datagram is transferred from one router to another router, the number is decremented. If and when the number in the TTL field decrements to zero, the router holding the IP datagram discards it and generates an Internet Control Message Protocol (ICMP) xe2x80x9cLifetime Exceededxe2x80x9d message for transmission back to the network device identified by the IP source address found in the IP datagram. ICMP messages are used to report errors and other conditions that require device attention.
When an IP datagram is initially tagged upon entry into a tag-switching network, the contents of its TTL field are typically transferred to the TTL field of the top tag. Each time the tagged IP datagram is transferred from one tag-switching router to another, the top tag TTL field is decremented. As is the case with conventional routers, if and when the number in the TTL field decrements to zero, the tag-switching router holding the tagged IP datagram discards it and generates an xe2x80x9cLifetime Exceededxe2x80x9d ICMP message for transmission back to the source device. As was mentioned above, though, the transit router may not have stored forwarding information needed to direct the ICMP message back to the discarded packet""s source, so the tag-switching transit router may be unable to route the xe2x80x9cLifetime Exceededxe2x80x9d ICMP message. Among other things, this causes the commonly used xe2x80x9ctraceroutexe2x80x9d tool to fail.
The second problem concerns the xe2x80x9cDon""t Fragmentxe2x80x9d (DF) bit that is found in the IP datagram. If the DF bit is set, a router will not fragment the IP datagram into smaller packets. Instead, when a router determines that a IP datagram is too large and the DF bit is set, the router will discard the IP datagram and generate an ICMP xe2x80x9cPacket Too Largexe2x80x9d message for transmission back to the IP source address found in the data packet. The tag-switching transit router""s inability to route the xe2x80x9cPacket Too Largexe2x80x9d ICMP message cause the xe2x80x9cPath MTU Discoveryxe2x80x9d procedure to fail.
Therefore, what is needed is a method and apparatus to properly route ICMP messages generated at tag-switching transit routers.
This invention provides a particularly simple method and apparatus for properly routing Internet Control Message Protocol (ICMP) messages in tag-switching backbones that interconnect to conventional Internet Protocol (IP) networks. An IP datagram received by a transit router may have a fault condition wherein it exceeds a lifetime threshold as specified by the xe2x80x9ctime-to-livexe2x80x9d (TTL) field, or because it is too large to transmitted and cannot be fragmented as dictated by the xe2x80x9cdon""t fragmentxe2x80x9d (DF) bit in the IP datagram header field. When a transit router generates the ICMP message to report the fault back to the originator of the IP datagram, it replaces the received IP datagram with one that contains the ICMP message and forwards it as though it were the original packet.
Until it leaves the tag-switching network, the resultant ICMP message will then continue along the (tag-specified) forward path that the discarded IP datagram would have taken. Then the first non-tag-switching router, e.g., CE1 in the FIG. 3 example, will forward it in accordance with the ICMP message""s destination IP address, i.e., the discarded IP datagram""s source IP address. The resultant route will typically start with the egress router of the forward path, e.g., PE1 in the FIG. 3 example, which, guided by that destination IP address, will properly tag it for transmission back through the tag-switching network toward the discarded IP datagram""s source. All of this is accomplished without requiring any additional routing information in the transit routers.