1. Field of the Invention
The present invention relates to memory devices for computers and other signal processing applications, and, in particular, to a memory architecture for routing tables used in packet-based communications, such as IP-routing tables.
2. Description of the Related Art
A critical function in network routers used in packet-based communications networks is packet classification (i.e., determining routing and traffic policies for each incoming packet based on information from the packet itself). A prime example is the Internet Protocol's basic routing function (IP-lookup), which determines the next network hop for each incoming packet. Its complexity stems from wildcards in the routing tables and from the Longest-Prefix Match (LPM) algorithm mandated by the Classless Inter-Domain Routing (CIDR) protocol.
Since the advent of CIDR in 1993, IP routes have been identified by a <route prefix, prefix length> pair, where the prefix length is between 1 and 32 bits. For every incoming packet, a search must be performed in the router's forwarding table to determine the packet's next network hop. The search may be decomposed into two steps. First, the set of routes with prefixes that match the beginning of the incoming packet's IP destination address is found. Then, among this set of routes, the one with the longest prefix is selected. This identifies the next network hop.
What makes IP-lookup an interesting problem is that it must be performed increasingly fast on increasingly large routing tables. Today's leading (2.5, 5, and 10 Gbit/sec) network processors achieve the necessary lookup rate using a combination of high-speed memories and specialized access hardware. Another direction concentrates on partitioning routing tables in optimized data structures, often in tries (a form of trees), so as to reduce as much as possible the average number of accesses needed to perform LPM. Each lookup, however, requires several dependent (serialized) memory accesses, stressing conventional memory architectures to the limit. Memory latency and not bandwidth is the limiting factor with these approaches.
A fruitful approach to circumvent latency restrictions is through parallelism: searching all the routes simultaneously. Content addressable memories (CAMs) perform this fully parallel search. CAM differs from standard memory as follows. In standard memory, an “input address” is specified, and the memory returns the data stored at that address. In CAM, “input data” is specified, and the memory returns the address where that data is stored.
To handle route prefixes (e.g., routes ending with wildcards), ternary CAMs (TCAMs) are used. Ternary CAMs enhance the functionality of standard (binary) CAMs with the addition of a “local mask” for each entry, where the local mask has the same number of bits as the CAM entries. This mask specifies which bits of an entry must coincide with the input data for there to be a match. In particular, TCAMs have an additional “don't care” bit for every tag bit. When the “don't care” bit is set, the tag bit becomes a wildcard and matches anything. The ternary capability of TCAMs makes them an attractive solution for the IP-lookup problem and thus TCAMs have found acceptance in many commercial products.
In a TCAM, IP-lookup is performed by storing routing table entries in order of decreasing prefix lengths. TCAMs automatically report the first entry among all the entries that match the incoming packet destination address (top-most match). The need to maintain a sorted table in a TCAM makes incremental updates a difficult problem. If N is the total number of prefixes to be stored in an M-entry TCAM, naive addition of a new update can result in O(N) moves. Significant effort has been devoted in addressing this problem; however, all the proposed algorithms require an external entity to manage and partition the routing table.
In addition to the update problems, two other major drawbacks hamper the wide deployment of TCAMs: high cost/density ratio and high power consumption. The fully associative nature of the TCAM means that comparisons are performed on the whole memory array, costing a lot of power. A typical 18-Mbit 512K-entry TCAM can consume up to 15 Watts when all the entries are searched. TCAM power consumption is critical in router applications, because it affects two important router characteristics: linecard power and port density. Linecards have fixed power budgets because of cooling and power distribution constraints. Thus, only a few power-hungry TCAMs can typically be implemented per linecard. This, in turn, reduces port density (i.e., the number of input/output ports that can fit in a fixed volume), thereby increasing the running costs for the routers.
Efforts to divide TCAMs into “blocks” and search only the relevant blocks have reduced power consumption considerably. “Blocked” TCAMs are in some ways analogous to set-associative memories. Set associativity means that a given route or prefix maps (is placed) in a relatively small set of possible places in the memory. The route/prefix can occupy any place in this small set without distinction. Full associativity, on the other hand, means that a given route/prefix can occupy any place in memory. In typical blocked TCAMs, the associativity sets are still disadvantageously large. Moreover, in TCAMs, blocking further complicates routing-table management, requiring not only correct sorting but also correct partitioning of the routing tables. Routing-table updates also become more complicated. In addition, external logic to select blocks to be searched is necessary. All these factors correspond to disadvantages of TCAMs in terms of ease-of-use, while still failing to reduce power consumption below that of a straightforward set-associative array.
More seriously, blocked TCAMs can reduce only average power consumption. When the main constraint is the fixed power budget of a linecard, a reduction of average power consumption is of limited value, since maximum power consumption still matters.