1. Field of the Invention
This invention relates to the field of electronic digital communication, and more specifically, to devices and methods for routing data packets.
2. Description of Related Art
With everyone building Web Sites, Internet usage has been expanding at a rate more commonly associated with nuclear reactions. Internet traffic is exploding because of a growing number of users as well as a growing demand for bandwidth intensive data. Multimedia applications, for instance, can easily consume megabytes of bandwidth. To keep up with increased traffic, link speeds in the Internet core have been increased to 622 Mbps, and a number of vendors are providing faster routers.
A traditional router performs two major tasks in forwarding a packet: looking up the packet's destination address in the router database, and switching the packet from an incoming link to one of the outgoing links. With recent advances such as those discussed in N. McKeown et al., "The Tiny Tera: A Packet Switch Core," IEEE Micro, January/February 1997, pp. 26-33, and J. Turner, "Design of a Gigabit ATM Switch," Proc. SIGCOMM 97, October, 1997, the task of switching is well understood, and most vendors use fast buses or crossbar switches. Several new algorithms have been developed recently for address lookup as well. See, for example, DegerMark et al., "Small Forwarding Tables for Fast Routing Lookups," Computer Communication Review, October, 1997; M. Waldvogel et al., "Scalable High Speed IP Routing Lookups," Proc SIGCOMM 97, October 1997; S. Nilsson et al., "Fast Address Look-Up for Internet Routers," Proceedings of IEEE Broadband Communications 98, April, 1998; and V. Srinivasan et al., "Faster IP Lookups using Controlled Prefix Expansion," Proc. ACM Sigmetrics 98, June 1998. Thus it would appear that there is no inherent impediment to building Gigabit routers for traditional data forwarding in the Internet.
Increasingly, however, users are demanding, and some router vendors are providing, a more discriminating form of router forwarding. This new vision of forwarding is called Layer 4 Forwarding because routing decisions can be based on headers available at Layer 4 or higher in the OSI architecture. Layer 4 Switching offers increased flexibility: it gives a router the capability to block traffic from a dangerous external site, to reserve bandwidth for traffic between two company sites, and to give preferential treatment to one kind of traffic (e.g., online database transactions) over other kinds (e.g., Web browsing). Layer 4 switching is sometimes referred to in the vendor literature by the phrase "service differentiation". Traditional routers do not provide service differentiation because they treat all traffic going to a particular Internet address in the same way. Layer 4 Switching allows service differentiation because the router can distinguish traffic based on origin (source address) and application type (e.g., web traffic vs. file transfer).
Layer 4 Switching, however, does not come without some difficulties. First, a change in higher layer headers will require reengineering the routers, which is why routers have traditionally used only Layer 3 headers. Second, when data is encrypted for security, it is not clear how routers can get access to higher layer headers.
Despite these difficulties, several variants of the Layer 4 switching have already evolved in the industry. First, many routers implement firewalls (see W. Cheswick et al., "Firewalls and Internet Security," Addison-Wesley, 1995) at trust boundaries, such as the entry and exit points of a corporate network. A firewall database consists of a series of packet filters that implement security policies. A typical policy may be to allow remote login from within the corporation, but to disallow it from outside the corporation. Second, the need for predictable and guaranteed service has led to proposals for reservation protocols like RSVP (L. Zhang et al., "RSVP: A New Resource Reservation Protocol, IEEE Networks Magazine, September 1993) that reserve bandwidth between a source and a destination. Third, the cries for routing based on traffic type have become more strident recently--for instance, the need to route web traffic between Site 1 and Site 2 on say Route A and other traffic on say Route B. FIGS. 1A and 1B illustrate some of these examples.
These figures schematically illustrate filters that provide traffic sensitive routing, a firewall rule, and resource reservation. The first filter routes video traffic from S1 to D via L1; not shown is the default routing to D which is via L2. The second filter blocks traffic from an experimental site S2 from accidentally leaving the site. The third filter reserves 50 Mbps of traffic from an internal network X to an external network Y, implemented perhaps by forwarding such traffic to a special outbound queue that receives special scheduling guarantees; here X and Y are prefixes.
Once users have gotten used to the flexibility and features provided by firewalls, traffic reservations, and QoS (Quality of Service) routing, it is hard to believe that future routers can ignore these issues. On the other hand, it seems clear that the ad hoc solutions currently being deployed are not the best, and cleaner and more general techniques are possible. For example, a cleaner solution to the traffic sensitive routing and reservation problem would be to push some form of "traffic classifier" into the routing header to determine application requirements without inspecting higher layer headers. But whatever the final solutions will be, it seems clear that future routers will need to forward at least some traffic based on a combination of destination address, source address and some other classifier fields, whether they are in the routing (Layer 3) or higher layer (Layers 4 and up) headers.
A typical database today contains only a few (10-100 typically) filters. However, if we consider that typical backbone routers now have 40,000 prefixes, and if we qualify each destination prefix with even a few port numbers (e.g., for QoS routing) or source prefixes (e.g., for resource reservation between sites in a Virtual Private Network), it is not hard to imagine the need for several hundred thousand filters. Today, even firewall processing with 10-100 filters is generally slow because of linear search through the filter set, but is considered an acceptable price to pay for "security". Thus the problem of finding the best matching filter for up to 100K filters at Gigabit speeds is an important challenge.
In traditional message forwarding in an Internet router, each router maintains a forwarding database, which is consulted by the router to determine the outgoing link on which the message is forwarded. The computational problem of determining the outgoing link based on the message's address is called the address lookup problem.
Consider a hypothetical fragment of the Internet linking users in Europe with users in the United States. If a user in Paris, named Source, as shown on the left in FIG. 2, wants to send an email message to another user in San Francisco, then Source will send its message to a router R1, say, in Paris. The Paris router may send this message on the communication link L4 to router R in London. The London Router R may then send the message on link L2 to router R3 in San Francisco, and finally R3 sends the message to the destination user.
Thus, a message travels from source to destination alternating between communication links and routers, just like a postal letter travels from post office to post office using a communication channel (such as airplanes). The important question is: How does each post office decide where to forward the letter? The post offices make these forwarding decisions using the destination addresses on the letters. In the same way, routers make their forwarding decisions based on the Internet destination address that is placed in an easily accessible portion of the message called a header. Each router, thus, is a special computer whose job is to forward all incoming messages towards their final destinations. The router uses its forwarding database, which is a table TFD in the router's computer memory, listing each possible destination and the corresponding output link. FIG. 3 shows a schematic of router R's forwarding database. For instance, when a message MSG arrives on link L4, carrying the destination address San Francisco, the router R forwards the message to link L2.
While there are several tasks a router performs in forwarding a message, the address lookup is one of the major bottlenecks, and thus the subject of much current research. (The other tasks such as switching, etc., are better understood and have been optimized to satisfactory levels.) So, address lookup is a bottleneck at high speeds.
There are far too many different internet addresses for each router to keep in its database. While storing each destination address explicitly in the database may greatly simplify the lookup problem, the memory requirement for this scheme at each router will be impractically enormous. Furthermore, since the Internet is a dynamic entity, which is always changing and evolving, the database needs to be updated frequently (in some cases several thousand times a day). Updating a large database containing all internet addresses is also infeasible at such a high frequency.
Instead, the forwarding databases are organized using a concept called address prefixes. Consider FIG. 4 which provides a different, more geographical representation, of FIG. 2. The link L1 of router R is used to reach Boston as before, but Boston is also the "hub" for the entire USA--that is, we can reach any destination in the US from the hub router in Boston. Link L3 leads to California, from where a message can be sent directly to any location in California. Finally, we also have the direct link L2 from London to San Francisco.
This forwarding table compresses a very large number of table entries into one. For instance, while a naive database will have to maintain a separate entry for each and every destination in the US (possibly several thousand), the scheme as represented FIG. 4 instead uses a default route, via Boston. In particular, all those cities in the US outside California (such as Denver, Kansas, Baltimore, St. Louis) do not need any explicit table entry--they are all reachable from London through Boston on link L1. Clearly, the reduction in the database size can be (and is) dramatic.
More specifically, the router database stores address prefixes, as in FIG. 5. The first entry is the default route for any US city, which is specified using the notation USA.*. Any city in California is specified as USA.CA.*, while San Francisco is written as USA.CA.SF. So, we have specified the whole forwarding database for router R in FIG. 2 using just 3 entries, rather than one per city in the US. Of course, now to send a message to San Francisco, we need to use the address USA.CA.SF, and not just SanFrancisco, but this is easy to do.
While the use of prefixes greatly reduces the memory needed to store the database, it makes the lookup problem more complicated. For one thing, an address can match multiple prefixes. If that happens, it should be intuitively clear that the forwarding should occur using the most specific or the longest prefix match. Thus a packet addressed to USA.CA.SF matches USA.*, USA.CA.*, as well as USA.CA.SF, but it should be forwarded to link L2 associated with the longest match USA.CA.SF. See FIG. 4. (This is because we have a direct link to San Francisco and should use it in place of a more indirect route through Boston.) A packet with address USA.CA.LA matches USA.* and USA.CA.*, but not USA.CS.SF. In this case, the most specific match is USA.CA.* and so the packet should be forwarded to the link L3.
In summary, routers achieve enormous savings in table size by compressing several address entries into one, with the use of prefixes. Unfortunately, this benefit comes with a price, in that the routers must now do the address lookup by solving a much more difficult problem, referred to as the longest matching prefix problem.
Thus when a message to San Francisco arrives on link L4, router R looks up the destination address SanFrancisco in its forwarding table. Since the table says L2, the router then switches the entire message to the output link L2. It then proceeds to service the next arriving message. Notice that so far the word "lookup" is no different from looking up a word in a dictionary or a phone number in the phone book.
Thus the two main functions of a traditional router are to lookup destination addresses (address lookup) and then to send the packet to the right output link (message switching). Both must be done at very high speeds. However, the Internet lookup problem is a lot harder than looking up a phone number in a phone book, or a word in a dictionary. In those problems, we can search quite rapidly by first sorting all the words or names. Once sorted, if we are looking for a word starting with Sea, we simply go to the pages of S entries and then look for words starting with Sea, etc. Clearly, such lookup is a lot faster than looking up all entries in a dictionary. In fact, such lookup is called exact matching lookup; standard solutions based on hashing and binary search provide very fast times for exact matching. The Internet lookup problem is a lot harder than dictionary search because Internet routers store address prefixes in their forwarding tables to reduce the size of their tables. However, the use of such address prefixes makes the lookup problem one of longest matching prefix instead of exact matching. The longest matching prefix problem is a lot harder. Before we explain why, let us digress briefly and explain why routers store prefixes in their tables.
While we used English words as addresses for illustration in the lookup example, in reality the Internet addresses are strings of bits. Each bit is either 0 or 1; a bit string is a sequence of bits; and the length of a string is the number of bits in it. For instance, 1011 is a string of length 4 and 1010100 is a string of length 7. Internet addresses come in two types. The current Internet (IPv4, for Internet Protocol, version 4) uses addresses that are bit strings of length 32. We often say that IPv4 uses 32 bit addresses. The next generation of Internet (IPv6, for Internet Protocol, version 6) uses 128 bit addresses. We will see that the longer length of IPv6 addresses will only make the lookup problem more difficult for the routers.
Except for this cosmetic difference of bits versus English characters, the Internet address lookup problem is exactly the best matching prefix problem described above. To provide a more concrete example, let us consider the table of Internet address prefixes shown in FIG. 6.
Suppose we have a 32 bit IPv4 destination address whose first 6 bits are 101100. In this case, the best matching prefix is Prefix P6 (10110*); there are two other matches, namely P1, P4, but prefix P6 is the longest. Thus, any message to such a destination address should be sent to the output link corresponding to P6.
Traditional routing, using destination address only, is sufficient for delivering Internet messages to their intended destinations, but it does not allow a router to distinguish between different kinds of messages going to the same address, say, D. For instance, casual web surfing to address D may be less important than access to a company database at the same address. A network manager may wish to give more preferential treatment to the latter traffic (such as more bandwidth or less congested routes) than to the former.
Layer 4 Switching allows such differentiated service by using additional header fields in making forwarding decisions. In particular, it performs the best matching prefix lookup on a combination of several header fields, such as destination address, source address, and application port numbers. This combination of various fields is called a packet filter. Otherwise, the general routing framework remains the same--there is an output link associated with the best matching filter and the message is switched to that output link. Packet filters come with two additional features. First, there may be a "block" characteristic associated with the filter, which causes any message matching this filter to be dropped (not be forwarded); this is useful for security and firewalls. Second, a filter may specify a special output queue for the corresponding link, which can be used to reserve bandwidth for certain types of messages.
Thus, having the message forwarding (lookup) depend on additional fields, especially those describing application type sending the message, allows us to provide differential service. However, we first need to understand the format of an Internet message and describe the relevant fields that can affect message forwarding. Besides the destination address (the computer the message is going to) used in traditional Internet forwarding, there are at least four additional important fields: Internet Source address (the computer that sent this message), Protocol Classifier (the type of Internet service requested by the message), Destination Port number (the extension of the Process within the Destination Computer that should handle this message), and Source Port number (the extension of the Process within the Source Computer that sent this message). See FIG. 7, which shows these fields in an Internet message. It will be recognized, however, that the invention described herein is applicable to other protocols as well; the Internet is used only for illustration. Also, the invention is not limited for use with these specific fields, and can use other fields, including application layer fields such as Web URLs (uniform resource locators). The 5 fields used for illustrative purposes are probably of the most immediate interest, however, and will be used for many of our examples.
The Destination Address is the Internet address of the destination computer to which the message is being sent. The Source Address is the Internet Address of the computer that sent the message. The Protocol field describes the Internet protocol type of this message. The two most common ones are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). By way of analogy, TCP corresponds to certified mail with return receipt (offering reliability), while UDP corresponds to ordinary mail. Some applications prefer the more reliable TCP service, while others prefer the cheaper (albeit less reliable) UDP service.
For some security-sensitive applications, it is helpful to recognize messages containing TCP acknowledgments (analogous to the return receipt of a certified mail) and to treat them differently. Thus, in our examples, we will sometimes have a third type of Protocol value called "TCP-ACK," which is a TCP message with an acknowledgment in it. There are also other types of less commonly used Internet protocols. We will not use these protocols in our examples, but it will become clear upon study of the detailed description and the figures that the present invention can handle arbitrary values of the Protocol field.
Both the TCP and UDP protocols also define so-called Destination and Source Port numbers, which can be thought of as telephone extensions. When you call, say, the Acme Widget Company at 314-555-1234, the accounting department may have Extension 12 and the sales department may have Extension 15. So, you dial 12 if you wish to settle your bills, and dial 15 to buy more widgets. Similarly, within a given Internet computer, a destination port number represents the "extension" of the process (application program) that should receive a message, and source ports represent the "extension" of the process that originated the message.
In terms of the format, the Destination and Source addresses are 32 bit strings, the Protocol field is an 8 bit string, and the Destination and Source Ports are 16 bit strings. (See FIG. 7.) We will often refer to a specific port number as an integer. For example, if we say the Destination Port is 23, we mean that it is the 16 bit string that encodes the integer 23. Similarly, when we say that the Protocol Field is TCP-ACK, we mean that the corresponding bit string in the message encodes this value.
Port numbers are important because they can tell a router the type of application that is sending this message. For instance, most electronic mail programs use a protocol called SMTP in which mail is sent to Destination Port 25. Most file transfer occurs using a protocol called FTP, which uses Destination Ports 20 or 21. The World Wide Web, by far the most common application, sends messages to Destination Port 80 (but occasionally also to other easily recognized substitutes like 81, 800, 8000, or 8080). Thus, by simply giving more bandwidth to messages addressed to Port 25 than to Port 80, we can give preference to electronic mail over Web Traffic. Similarly, replies sent by Web Servers have Source Port 80, which allows us to distinguish such traffic.
The Protocol field is also important because certain Internet applications use TCP, while others use UDP, which allows us to distinguish between applications. Finally, the source address has obvious significance when we want to give preferential treatment to traffic depending on its origin.
Consider, for example, a company network that has two subnetworks, a corporate subnetwork and an engineering subnetwork. The company may choose to give priority to traffic originating from the corporate network. This is easily done, because in the Internet addressing scheme, all the addresses that are in the Corporate subnetwork will have a common prefix, say, P, while the engineering network addresses begin with a different prefix, say, Q. So, giving more bandwidth to traffic with source addresses matching the prefix P accomplishes the desired goal of assigning higher priority to the corporate subnet.
As a second example, suppose a university has traced several computer hacking incidents to a particular dormitory network (say, source address prefix R). The campus administrator may decide to restrict traffic from that dormitory network to only electronic mail. We can accomplish this by creating a packet filter that disallows any messages originating from the dormitory (source addresses that match R) whose destination port is not 25.
These examples shows that the decision on how to forward a message can depend on several fields in the message. In general, it can depend on a combination of the five fields we described or even others. Each combination of fields for which a manager (or a routing protocol) requires special treatment needs to be specified by a rule or a filter. Any fields that are irrelevant in a rule can be wildcarded (specified using the "don't care" character "*").
For example, the two rules for handling the dormitory network could be as follows. Rule 1 could be of the form "If the Destination Address=*, the Source Address matches Prefix S, the Protocol Field is TCP, the Destination Port is 25 and the Source Port is *, then forward the message based on the Destination Address." Rule 2 could be of the form, "If the Destination Address=*, the Source Address matches Prefix S, the Protocol Field is TCP, the Destination Port is *, and the Source Port is *, then drop the message".
The intent of the second rule is to drop all messages sent from the dormitory network for purposes other than email. However, a message sent from the dorm for email can match both rules, thus, the Rules conflict. Rule 1 says forward the message, Rule 2 says drop the message. The solution is to give each rule a unique priority or cost. When multiple rules match a message, the message is forwarded according to the lowest cost matching Rule. Thus in our example, if Rule 1 is given a lower cost than Rule 2, email from the dorm will indeed be allowed through.
In most firewall databases, the rules are listed in a linear order: the cost of a rule is equal to its position in the order. Thus the lowest cost rule is the first rule in the database that matches the message. In the general Layer 4 Switching problem, there may be a more general notion of cost. Thus, each rule or filter has an arbitrary cost, and the lookup problem is to find the least cost matching filter.
Each rule could specify an exact match on a field (such as "Protocol=TCP"), a prefix match (such as "source addresses matching the Dormitory subnetwork S"), or a range match (such as "port numbers in the range 1024-64,000"). Port ranges are especially useful for firewall applications because some applications cannot be identified by a single port number but by a range of port numbers--for example, outgoing remote login can use any port number greater than 1023. We now describe the filter matching problem more precisely.
Traditionally, the rules for classifying a message are called filters, and the Layer 4 Switching problem is to determine the lowest cost matching filter for each incoming message at a router.
We assume that the information relevant to a lookup is contained in K distinct header fields in each message. These header fields are denoted H[1], H[2], . . . , H[K], where each field is a string of bits. For instance, the relevant fields for an IPv4 packet could be the Destination Address (32 bits), the Source Address (32 bits), the Protocol Field (8 bits), the Destination Port (16 bits), the Source Port (16 bits), and TCP flags (8 bits). The number of relevant TCP flags is limited, and so we prefer to combine the protocol and TCP flags into one field--for example, we can use TCP-ACK to mean a TCP packet with the ACK bit set. (TCP flags are important for packet filtering because the first packet in a connection does not have the ACK bit set while the others do. This allows a simple rule to block TCP connections initiated from the outside while allowing responses to internally initiated connections.) Other relevant TCP flags can be represented similarly; UDP packets are represented by H[3]=UDP.
Thus, the combination (D, S, TCP-ACK, 63, 125), denotes the header of an IP packet with destination D, source S, protocol TCP, destination port 63, source port 125, and the ACK bit set.
The filter database of a Layer 4 Router consists of a finite set of filters, F.sub.1, F.sub.2 . . . F.sub.N. Each filter is a combination of K values, one for each header field. Each field in a filter is allowed three kinds of matches: exact match, prefix match, or range match. (It is possible to extend the type of matches for greater flexibility, but the examples presented herein use these three most common types.) In an exact match, the header field of the packet should exactly match the filter field--for instance, this is useful for protocol and flag fields. In a prefix match, the filter field should be a prefix of the header field--this could be useful for blocking access from a certain subnetwork. In a range match, the header values should lie in the range specified by the filter--this can be useful for specifying port number ranges.
Each filter F.sub.i has an associated directive act.sub.i, which specifies how to forward the packet matching this filter. The directive specifies if the packet should be blocked. If the packet is to be forwarded, the directive specifies the outgoing link to which the packet is sent, and perhaps also a queue within that link if the message belongs to a flow with bandwidth guarantees.
We say that a packet P matches a filter F if each field of P matches the corresponding field of F--the match type is implicit in the specification of the field. For instance, if the destination field is specified as 1010*, then it requires a prefix match; if the protocol field is UDP, then it requires an exact match; if the port field is a range, such as 1024-1100, then it requires a range match. For instance, let F=(1010*, *, TCP, 1024-1080, *) be a filter, with act=block. Then, a packet with header (10101 . . . 111, 11110 . . . 000, TCP, 1050, 3) matches F, and is therefore blocked. The packet (10110 . . . 000, 11110 . . . 000, TCP, 80, 3), on the other hand, does not match F.
Since a packet may match multiple filters in the database, we associate a cost for each filter to determine an unambiguous match. So each filter F in the database is associated with a non-negative number, cost(F), and our goal is to find the filter with the least cost matching a packet's header. Our cost function generalizes the implicit precedence rules that are often used in practice to choose between multiple matching filters. In firewall applications, for instance, rules or filters are placed in the database in a specific linear order, where each filter takes precedence over a subsequent filter. Thus, the goal there is to find the first matching filter. Of course, we can get the same effect in our invention by making cost (F) equal the position number of F in the database.
Several existing firewall implementations do a linear search of the database and keep track of the best matching filter. Some implementations use caching to improve performance--they cache full packet headers to speed up the processing of future lookups. The cache hit rate of caching full IP addresses in routers is at most 80-90% (C. Partridge, "Locality and Route Caches," in NSF Workshop on Internet Statistics Measurement and Analysis, San Diego, Calif., February 1996; P. Newman et al., "IP Switching and Gigabit Routers," IEEE Communications Magazine, January 1997) and cache hit rates are likely to be much worse for caching full headers. Incurring a linear search cost to search through 100,000 filters is a bottleneck even if it occurs on only 10 to 20% of the packets.
The least cost matching filter can be thought of as a special case of a very general multidimensional searching problem. Several general solutions exist for the problem. In particular, each K-field filter can be thought of as a K-dimensional rectangular box, and each packet header can be thought of as a point in the K-dimensional space. The least cost filter matching problem is to find the least cost box containing the header point. A general result in Computational Geometry offers a data structure requiring O(N(log N).sup.K-1) space, and search time O((log N).sup.K-1), where the logarithms are to the base 2 (for instance, see Section 2.3 in F. P. Preparata et al., Computational Geometry: An Introduction, Springer-Verlag, New York, N.Y. 1985). Unfortunately, the worst-case search and memory costs of this data structure are infeasible, even for modest values of N and K. For instance, when N=10,000 and K=4, the worst-case search cost is at least 13.sup.3 =2197 and the memory cost is 2197N.
Another possible technique is to generalize binary search by using quad-tree like construction in higher dimensions. (See, for instance, H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, 1989.) Consider, for instance, destination-source filters, which correspond to a two-dimensional search. A filter F=(D, S) can be mapped to a quad-tree cell (i, j) if D is i bits long and S is j bits long. Now, we can try to do a binary search by first matching the packet with the filters in the quad-tree cell (W/2, W/2), where W is the maximum bit length of any destination or source prefix. The problem is that the probe outcome (fail or match) only eliminates one quadrant of the search space, and requires three recursive calls (not one, as in 1 dimension) to finish the search, which leads to a large search time. One possible way to avoid making three recursive calls is to precompute future matches using markers, but that leads to an infeasible memory explosion of 2.sup.W/2. We have also shown a lower bound on hashing schemes as in M. Waldvogel et al., "Scalable High Speed IP Routing Lookups," Proc. SIGCOMM 97, October 1997, to show that they generalize poorly to multiple dimensions.
In summary, we believe that all known existing methods lead to either a large blowup in memory or lookup time for the least cost filter problem. It would therefore be advantageous to provide routers and routing methods that did not require either huge memory requirements or large lookup times.