The exploding number of applications with high bandwidth requirements—such as video conferencing and on-demand video—is resulting in a steep growth in internet traffic. This explosive growth in traffic volume is compounded by the fact that the number of internet hosts is also growing dramatically. As can be appreciated, widespread deployment of next generation transmission facilities—such as the OC768 standard of 40 Gbps—will translate into better end-to-end performance only if the performance of devices such as network routers improves along with necessary increases in routing table sizes, line rates, and the volume of per-packet processing.
Longest Prefix Matching (LPM) is a technique that has become a fundamental part of IP-lookup, packet classification, intrusion detection and other packet-processing tasks that are performed by a router. As is known, a prefix is a binary string of a particular length followed by a number of wildcard bits. IP-lookup amounts to finding the longest matching prefix among all prefixes in a routing table. Packet classification involves finding the best matching rule for a packet among a set of rules. Because each rule has multiple fields, packet classification is essentially a multiple-field extension of IP-lookup and can be performed by combining building blocks of LPM for each field (See, for example, V. Srinivasan, G. Varghese, S. Suri and M. Waldvogel, “Fast and Scalable Layer-4 Switching”, In Proceedings of ACM SIGCOMM 1998).
Presently, there exist three major techniques for performing LPM, namely: Ternary Content Addressable Memories (TCAM), trie-based schemes, and hash-based schemes. Ternary Content Addressable Memories are custom, non-commodity devices that simultaneously compare an incoming query with every prefix stored in memory. Due to their custom, non-commodity nature and “brute-force” searching method, the cost and/or power dissipation of TCAMs is prohibitive for large tables and high line rates.
Trie-based schemes use a tree-like data structure to match a query, successively a few bits at a time, against prefixes in a table. Due to the method of matching, the lookup latency depends on the length of the prefixes. For long prefixes, such as those used with IPv6, the worst case lookup latency becomes considerably long—leading to design complications (e.g., larger buffers, deep and complex pipelines) with high bandwidth networks. Furthermore, a trie-based scheme requires a space to hold pointers from nodes to their children, resulting in large memory usage. Even in state-of-the-art trie schemes like Tree Bitmap (See, e.g., Will Eatherton, George Varghese and Zubin Dittia, Tree Bitmap: Hardware/Software Ip Lookups with Incremental Updates”, ACM SIGCOMM Computer Communication Review 34(2), 2004.) the necessary data structure is quite large, requiring that the trie be stored off-chip. Such off-chip designs are undesirable for a number of reasons including long latency, poor performance, high power and design complexity.
Hash-based schemes however, do not perform brute-force searches like TCAMs and consequently they may potentially require an order-of-magnitude lower power. In addition—and quite unlike tries—hash-based schemes employ a flat data-structure, thereby permitting smaller memory sizes which are amenable to on-chip storage, and key-length-independent O(1) latencies.
Inasmuch as a transition to IPv6 may well preclude tries and TCAMs as LPM solutions for high line rates and large tables, an efficient hash-based scheme may offer promise. Regardless of future transition however, a superior hash-based scheme may be an invaluable asset to present day routers as well.
Despite such promise however, there are at least two significant problems that obstruct the practical deployment of any hash-based scheme for LPM. First, hash tables inherently have collisions and necessarily use techniques like chaining to deal with them. As a result, lookup rates for hash tables are unpredictable and sensitive to the set of prefixes in the table. Since systems which employ hash-schemes i.e., routers, oftentimes must guarantee the worst-case lookup rate as dictated by the line-rate, such unpredictability is quite undesirable.
Unfortunately, improving the probability of collisions (See, e.g. Haoyu Song, Sarang Dharmapurikar, J. Turner and J. Lockwood, “Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing,”, Proceedings of the Annual ACM SIGCCOMM 2005) does not guarantee a worst-case lookup rate as demanded by a line-rate, and consequently a router employing such an improved scheme is vulnerable to denial of service attacks (See, e.g., V. P. Kumar, T. V. Lakshman and D. Stiliadis, “Beyond Best Effort: Router Architectures for the Differentiated Services of Tomorrow's Internet”, IEEE Communications Magazine, May 1998). Furthermore, even infrequent collisions produce variable lookup latencies thereby requiring complicated queueing and stalling mechanisms in the router pipeline. Finally, in order to reduce the probability of collisions large tables are required which necessitate off-chip storage for most of the data structure thereby compounding the power dissipation and off-chip bandwidth problems previously described.
The second problem associated with hash-based schemes for LPM results from the fact that with LPM the keys being searched are fully specified y-bit values whereas the prefixes originally inserted have shorter length x (x<y) and end in y-x wildcard bits. Because hash functions cannot operate on wildcard bits, and assuming a specific bit-value for the wildcard bits may cause erroneous search results, a separate hash table is required for each prefix length x. Consequently then a search looks up multiple tables, and picks the longest matching prefix.
For both on-chip and off-chip implementation of the tables, each additional hash table requires more memory banks or ports, pins for bandwidth (if off-chip), power, wiring, and arbitration logic. And while IPv4 would require as many as 32 hash tables—IPv6 would require up to 128 tables !
One approach for reducing the number of unique prefix lengths is known as controlled prefix expansion (CPE) (See, e.g., V. Srinivasan and G. Vargese, “Faster IP Lookups Using Controlled Prefix Expansion”, AC<SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, 1998). Controlled Prefix Expansion converts a prefix of length x into a number of prefixes of longer length x+l, where (l≧1) by expanding l of its wildcard bits into their 2l possibilities. In so doing, CPE inflates the number of prefixes by a 2average expansion length-factor adversely affecting storage space. The fewer the unique prefix lengths desired, the larger is the explosion. The result is that for a routing table of 100K prefixes, a system based on average-case design may actually have to accommodate 500K prefixes, whereas a worst-case design may be impossible to implement.