§1.1. Field of the Invention
Embodiments consistent with the claimed invention concern Internet Protocol (“IP”) networks. In particular, embodiments consistent with the claimed invention concern hash-based route lookup methods and apparatus.
§1.2. Background Information
In IP route lookup, a system (such as a router, for example) extracts each incoming packet's destination IP address and performs a longest prefix match with stored routes. Ternary content-addressable memory (“TCAM”) based schemes are widely used in midrange routers. (See, e.g., the articles: F. Zane, G. Narlikar and A. Basu, “CoolCAMs: Power-Efficient TCAMs for Forwarding Engines,” in Proc of INFOCOM, Vol. 1, pp. 42-52, (2003); and K. Zheng, C. Hu, H. Lu and B. Liu, “A TCAM-Based Distributed Parallel IP Lookup Scheme and Performance Analysis,” IEEE/ACM Transactions on Networking, Vol. 14, No. 4, pp. 863-875 (2006), each of which is incorporated herein by reference.) Unfortunately, however, their high cost and large power consumption make them unattractive for high-end routers such as so-called core routers.
Direct lookup schemes can use standard SRAM or DRAM to store the next hop for each prefix, in a table or multiple tables that are addressed by the prefix. However, such schemes are only effective for short address lookups (e.g., less than 16 bits), and are not practical for longer lookups due to prefix expansion. (See, e.g., the articles: P. Gupta, S. Lin and N. McKeown, “Routing Lookups in Hardware at Memory Access Speeds,” in Proc of the IEEE Computer and Communications Societies (INFOCOM 1998), Vol. 3, pp. 1240-1247 (March/April 1998); N.-F. Huang and S.-M. Zhao, “A Novel IP-Routing Lookup Scheme and Hardware Architecture for Multigigabit Switching Routers,” IEEE Journal on Selected Areas in Comm, Vol. 17, No. 6, pp. 1093-1104 (June 1999); N.-F. Huang, S.-M. Zhao, J.-Y. Pan and C.-A. Su, “A Fast IP Routing Lookup Scheme for Gigabit Switching Routers,” in Proc. of the IEEE Computer and Communications Societies (INFOCOM 1999), Vol. 3, pp. 1429-1436, (March 1999); and V. Srinivasan and G. Varghese, “Fast Address Lookups using Controlled Prefix Expansion,” ACM Transactions on Computer Systems, Vol. 17, No. 1, pp. 1-40, (1999), each of which is incorporated herein by reference.)
To avoid the prohibitively large memory requirements of direct lookup schemes due to prefix expansion, hash-based lookup schemes have been proposed. (See, e.g., the articles: S. Cadambi, S. Chakradhar, and H. Shibata, “Prefix Processing Technique for Faster IP Routing,” U.S. Pat. No. 7,398,278; S. Kaxiras and G. Keramidas, “IPStash: A Set-associative Memory Approach for Efficient IP-Lookup,” in Proc. of INFOCOM, Vol. 2, pp. 992-1001 (2005); J. Hasan, S. Cadambi, V. Jakkula and S. Chakradhar, “Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture,” in Proc of ISCA, pp. 203-215 (2006); H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood, “Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing,” in Proc. of SIGCOMM, pp. 181-192 (2005); S. Dharmpurikar, P. Krishnamurthy and D. E. Taylor, “Longest Prefix Matching Using Bloom Filters,” IEEE/ACM Transactions on Networking, Vol. 14, No. 2, pp. 397-409 (2006); H. Song, F. Hao, M. Kodialam and T. Lakshman, “IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100 Gbps Core Router Line Cards,” in Proc of INFOCOM, pp. 2518-2526 (2009); and M. Bando, N. S. Artan, and H. J. Chao, “FlashLook: 100 Gbps Hash-Tuned Route Lookup Architecture,” in Proc. of HPSR, 2009, each of which is incorporated herein by reference.) Whether applying a hash function to each prefix length or to a certain prefix length (e.g., /16, /24 and /32 for IPv4), those prefixes are hashed to a table. Various methods have been proposed to reduce the number of prefixes hashed to the same entry of the hash table. Bloom filters are sometimes used to query the existence of the prefix before finding the next hop information (“NHI”) of the prefix.
Hardware trie-based schemes can achieve high throughput. However, they require many memory chips in parallel to accommodate the pipelined stages required by the many levels of the trie (which has a height proportional to the number of bits in the IP address). (See, e.g., the articles: W. Eatherton, G. Varghese and Z. Dittia, “Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates,” ACM SIGCOMM Computer Communication Review, Vol. 34, No. 2, pp. 97-122 (2004); S. Sikka and G. Varghese, “Memory-Efficient State Lookups with Fast Updates,” in Proc. of SIGCOMM 2000 pp. 335-347 (2000); R. Sangireddy, N. Futamura, S. Aluru and A. K. Somani, “Scalable, Memory Efficient, High-Speed IP Lookup Algorithms,” IEEE/ACM Transactions on Networking, Vol. 13, No. 4, pp. 802-812 (2005); H. Song, J. Turner, and J. Lockwood, “Shape Shifting Tries for Faster IP Route Lookup,” in Proc. of ICNP, 2005; A. Basu and G. Narlikar, “Fast Incremental Updates for Pipelined Forwarding Engines,” IEEE/ACM Transactions on Networking, Vol. 13, No. 3, pp. 690-703 (2005); and W. Jiang and V. K. Prasanna, “Multi-Terabit IP Lookup Using Parallel Bidirectional Pipelines,” in Proc. of CF, pp. 241-250 (2008), each of which is incorporated by reference.) This is especially a problem for IPv6, which has a larger number of bits in the address.
Multibit-trie architectures, such as Tree Bitmap, have gained much attention because they can reduce the number of pipeline stages, and because of their efficient data structures. Each Tree Bitmap node contains two pieces of information: (1) an Internal Bitmap of the sub-trie and a pointer for the NHI; and (2) an External Bitmap for a head pointer to the block of child nodes and a bitmap for child sub-tries. As a result, one lookup requires multiple off-chip memory accesses. To reduce the number of off-chip memory accesses, H. Song et al. proposed Shape Shift Tries (“SST”), which allow the number of trie levels in each access to be flexible. (H. Song et al, Proc. of ICNP, 2005) SST can achieve approximately 50% reduction in memory accesses compared to the Tree Bitmap. Although this reduction is significant, the number of memory accesses required by the SST is still considerable. In addition, SST is only suitable for sparse tries, limiting its application to future routers.
A different way to reduce memory accesses in the Tree Bitmap architecture is to increase the “stride size”. The stride of an array of data refers to the number of locations in memory between successive array elements, measured in bytes or in units of the size of the array's elements. However, increasing the stride size will increase the bitmap size exponentially and result in more off-chip memory accesses, which limit system performance. Another disadvantage for choosing a large stride size is that update speed may be degraded. This is because there will be more child nodes in each trie, and they are stored in consecutive memory locations. Whenever a new child node is added, many other child nodes are moved to other memory locations. In the worst case, an entire block of child nodes is relocated.
Another typical drawback of trie-based schemes is their uneven distribution of data structures in memory. Usually in the tries, the lower level contains many more prefixes than the higher level. Each pipeline stage consists of either one level or multiple levels in the trie, and typically stores the information of its prefixes in a memory bank. As the number of prefixes differs drastically from stage to stage, the loading among memory modules is quite uneven, resulting in low memory utilization. In W. Jiang, et al, the authors proposed a solution to balance the pipeline memory. (W. Jiang, et al, Proc. of CF, 2008, pp. 241-250) However, their scheme uses twenty-five independent memory chips resulting in a high cost. The number of memory chips required is even more when IPv6 is to be supported.
In view of the foregoing, it would be useful to provide a route lookup system that overcomes one or more of the above-described limitations.