Longest Prefix Matching (LPM) is the problem of determining from a set of strings the longest one that is a prefix of some other, given string. LPM is at the heart of many important applications. Internet Protocol (IP) routers routinely forward packets by computing from their routing tables the longest bit string that forms a prefix of the destination address of each packet. In copending U.S. patent application Ser. No. 09/603,154, filed Jun. 23, 2000 and entitled “Network-Aware Clustering of Web Clients,” Krishnamurthy and Wang describe a method for clustering Web clients by identifying a set of IP addresses that with high probability are under common administrative control and topologically close together. Such clustering information has applications ranging from network design and management to providing on-line quality of service differentiation based on the origin of a request. The proposed clustering approach is network aware in that addresses are grouped based on prefixes in snapshots of border gateway protocol (BGP) routing tables.
Telephone network management and marketing applications often classify regions in the country by area codes or combinations of area codes and the first few digits of the local phone numbers. For example, the state of New Jersey is identified by area codes such as 201, 908, and 973. In turn, Morris County in New Jersey is identified by longer telephone prefixes like 908876 and 973360. Those applications typically require computing in seconds or minutes summaries of calls originating and terminating at certain locations from daily streams of telephone calls, up to hundreds of millions records at a time. That requires very fast classification of telephone numbers by finding the longest matching telephone prefixes.
LPM solutions must be considered in the context of the intended use to maximize performance. The LPM applications discussed above have some common characteristics:
1. Look-ups overwhelmingly dominate updates of the prefix sets. A router may route millions of packets before its routing table changes. Similarly, telephone number classifications rarely change, but hundreds of millions of phone calls are made daily.
2. The look-up rate is extremely demanding. IP routing and clustering may require worst-case LPM performance of a couple hundred nanoseconds per look-up. That severely limits the number of machine instructions and memory references allowed.
3. Prefixes and strings are bounded in length and based on small alphabets. For example, current IP addresses are 32-bit strings and U.S. telephone numbers are 10-digit strings.
The first two characteristics mean that certain theoretically appealing solutions based on, e.g., suffix trees, string prefix matching or dynamic string searching are not applicable, as their performance would not scale. Fortunately, the third characteristic means that specialized data structures can be designed with the desired performance levels. In the present application, the inventors disclose a system and method that may be generalized to bounded strings such as telephone numbers.
The present application discloses retries, a novel, fast, and compact data structure for LPM on general alphabets. Simulation experiments based on trace data from real applications show that retries outperform other published data structures for IP routing. By extending LPM to general alphabets, retries also admit new applications that could not exploit prior LPM solutions designed for IP look-ups.
The popularity of the Internet has made IP routing an important area of research. Several LPM schemes for binary strings have been disclosed in this context. For example, the idea of using ranges induced by prefixes to perform IP look-ups was suggested in B. Lampson, V. Srinivasan, & G. Varghese, “IP Lookups Using Multiway and Multicolumn Search,” IEEE/ACM Trans. Netwk., 7(3):324–34 (1999). The authors of that paper describe a method for routing Internet packets exploiting a simple relationship between IP prefixes and nested intervals of natural numbers. That method was later analyzed by Gupta, Prabhakar, and Boyd in “Near-Optimal Routing Lookups with Bounded Worst Case Performance,” Proc. 19th IEEE INFOCOM, vol. 3, p. 1184–92 (2000), to guarantee worst-case performance. The method, however, is not generalized to other instances of LPM such as non-binary strings and arbitrary alphabets.
Multi-level table look-up schemes are used for both hardware and software implementations for IP routing. Since modern machines use memory hierarchies with sometimes dramatically different performance levels, some implementations attempt to build data structures conforming to the memory hierarchies at hand. Both the LC (level compression)-trie scheme of S. Nilsson & G. Karlsson, “IP-Address Lookup Using LC-Tries,” IEEE J. Sel. Area. Comm., 17(6):1083–92 (1999), and the multi-level table of V. Srinivasan & G. Varghese, “Fast Address Lookup Using Controlled Prefix Expansion,” ACM Trans. Comp. Sys., 17(1):1–40 (1999), attempt to optimize for L2 caches by adjusting the number of levels to minimize space usage. Efficient implementations, however, exploit the binary alphabet of IP addresses and prefixes.
G. Cheung & S. McCanne, “Optimal Routing Table Design for IP Address Lookup Under Memory Constraints,” Proc. 18th IEEE INFOCOM, vol. 3, p. 1437–44 (1999), took a more general approach to dealing with memory hierarchies that includes the use of prefix popularity. Those authors consider a multi-level table scheme similar to retries and attempt to minimize the space usage of popular tables so that they fit into the fast caches. Since the cache sizes are limited, they must solve a complex constrained optimization problem to find the right data structure. L1 caches on most machines are very small, however, so much of the gain comes from fitting a data structure into L2 caches. In addition, the popularity of prefixes is a dynamic property and not easy to approximate statically. Cheung & McCanne do not focus on bounding the number of memory accesses and minimizing memory usage.
P. Crescenzi, L. Dardini & R. Grossi, in “IP Address Lookup Made Fast and Simple,” Proc. 7th ESA, vol. 1643 of LNCS, p. 65–76, Springer-Verlag (1999), proposed a compressed-table data structure for IP look-up. The key idea is to identify runs induced by common next-hops among the 232 implicit prefixes to compress the entire table with that information. The technique works well when the number of distinct next-hops is small and there are few runs, which is mostly the case in IP routing. The compressed-table data structure is fast, because it bounds the number of memory accesses per match. Unfortunately, in network clustering applications, both the number of distinct next-hop values and the number of runs can be quite large. Thus, that technique is not practical in such applications.
It is therefore desirable to provide a method for fast prefix matching that may be generalized for arbitrary alphabets, uses a limited amount of memory and does not involve solving complex optimization problems.