The internet is a mesh of interconnected routers. The internet protocol (IP) part of the TCP/IP protocol stack is used for communicating data between the routers. The large and constantly increasing internet traffic volume depends on continuously increasing router efficacy.
A routers function is to forward incoming packets towards their final destination, which they reach in multiple hops. To forward packets toward their final destination the router has to perform address lookup, buffering, scheduling, and finally, send the packet to the next hop address through the appropriate router port. The address lookup, being associative, is a key processing bottleneck requiring the matching of a destination IP address encoded in the incoming packet against a table stored in the router. Packets are routed on a next-hop basis, i.e., the router sends an incoming packet to the next hop only—the packet reaches its final destination in multiple hops. Each router has a database, in the form of a routing table containing prefixes of varying length and for each, their corresponding next hop port (NHP).
Classless Inter Domain Routing
The internet protocol (IP) has the task of delivering distinguished protocol datagrams (packets) from the source host to the destination host, based solely on their destination addresses. The IP has worked extremely well, allowing exponential growth of the internet. Initially, IP addresses were divided into the five categories, known as classes. To expand the usable IP address space, classless inter-domain routing (CIDR) was implemented. CIDR allocates IP addresses in variable-sized blocks without regard to the previously used classes. CIDR was initially implemented for IPv4 where the address length is 32-bits. With continued internet growth, this address range is being exhausted. Consequently, IPv6 with 128-bit addressing is being introduced.
Classless inter-domain routing (CIDR) was implemented in 1993 to cope with the increasing demand by allocating addresses in variable-sized blocks without regard to the previously used five classes. Using CIDR, a routing table entry is identified by a route prefix, a prefix length (in the form of mask bits) and an associated output port identifier. The CIDR address lookup mechanism is based on longest prefix matching, using two steps: First the routing database (table) is searched to obtain the longest matching prefix from the many that may match the packets' destination IP address. Secondly, the next hop port associated with this longest matched prefix is determined and the packet is forwarded to the appropriate destination/port. If none of the prefixes sufficiently match the destination IP address, the packet is sent to a default port. The initial CIDR implementation (IPv4) uses a 32-bit address length. The dramatic growth of the internet is rapidly exhausting this address range, so IPv6, with 128-bits addresses is being introduced.
Longest Prefix Matching
Routing based on longest prefix matching essentially routes the packet to a location as close as possible to the destination. The destination address of an incoming packet is compared with all of the current prefixes in the routing table to determine the next hop associated with the longest matching prefix. If no prefixes match the destination IP address, the packet is sent to a default port. The length of the valid part of addresses can vary up to 32 bits in IPv4, and up to 128 bits in IPv6. Mask bits determine the valid lengths of the address, i.e., address bits for which mask bits are ‘1’ are valid and the rest of the address is ignored (see FIG. 1) working from MSB (most significant bit) towards LSB (least significant bit).
FIG. 1 shows a conventional routing table implementation. It consists of match block and priority encoder block. Address bits for which mask bits are ‘1’ are valid and the rest of the address is ignored (see FIG. 1). Hence mask bits are set to ‘1’ in at least part of the MSB and to ‘0’ for the unused IP address LSB bits. The addresses are grouped together and strictly ordered by mask size. The mask associated with each address is also shown. For instance, when a destination IP address of 192.160.0.128 is compared with the prefixes in the table, it matches with the address stored at locations 2, 1003 and 1005 but the priority encoder (PE) selects the location 2 as it has the longest prefix match. This pointer is used to read the NHP information stored in SRAM or DRAM.
Another example of an IPv4 prefix table is shown in Table 1. When a destination IP address 128.45.67.12 is compared with the prefixes, it matches entries 1, 4 and 5. The packet forwards to the destination specified in next hop port 12 since it is the longest prefix match.
TABLE 1IPv4 routing table examplePrefixMaskNext Hop Port128.45.67.12255.255.255.1284192.125.167.129255.0.0.06192.45.121.112255.255.0.02128.45.67.35255.255.255.09128.45.67.12255.255.255.25512
Ordering the entries makes selecting the longest prefix match straightforward—these operations resemble leading zeros detection, since the bottommost match (logic 1) in the table is selected. However, the strict ordering requires that the routing table be taken off-line when new entries are added, since insertions may require substantial shifts in the data locations.
Software IP Lookup
Software approaches have the advantage of programmability, but the associative lookup requires multiple clock cycles. A tree based data structure can be used for IP address storage and lookup. For IPv4, the longest prefix length may be 32 bits so an IP lookup requires up to 32 memory accesses. To decrease the memory accesses required, a complete binary tree expansion has been proposed but this requires an array with 232 entries. A forwarding table scheme reduces the memory storage size and accesses, but is also large. In general, any software approach on standard microprocessors must comprehend issues such as the impact of cache misses, the number and latency of memory accesses, and multiple processor dock cycles for search execution.
Hardware IP Lookup
IP routing hardware mostly concentrates on matching the destination address with the addresses in the routing table, which while only part of the IP lookup problem, is, as mentioned, the bottleneck. Wade et al. proposed an addressable search engine using a TCAM structure for a database accelerator chip and a modified ripple chain priority encoder. Chuang et al. also proposed using CAM structures. Pei et al. implemented a high radix tree in silicon for exact matching, using a CAM-based forwarding table. Degermark et al. used SRAM and improved the performance by converting the forwarding table radix tree to a complete tree by filling the empty branches, requiring at most four memory accesses. Gupta et al. proposed a two memory access, two-level indirect lookup scheme. Adding a length field to the first (segment) table that maintains the length of the second (offset) table allows a variable offset and thus more efficient memory utilization.
TCAM Based IP Lookup
FIG. 1 is the top level architectural view of a TCAM based router. All valid combinations of w-bit IP prefixes may require as many as 2w+1−1 entries, i.e., one for the null prefix, covering all entries, plus as many 1 through up to 32-bit prefixes as needed. CAMs provide a single clock latency matching solution. Ternary content addressable memory (TCAM) allows longest prefix match (IP lookup) operation by using the stored “don't-care” state to mask matches, with multiple entries to allow multiple mask lengths to be considered. Each TCAM cell has two stored bits—one for the address and one for the mask. Akhbarizadeh, et al., encoded the mask bit in the address for each 8-bit block using only nine SRAM cells but with a more complicated match line structure. As in the TCAMs, each entry can only compare one prefix length. Thus, entries are required for each possible matching prefix length, i.e., from N to M, where N is the default (no match) address length, and M is the mask length. Obviously, up to M-N entries may match a given address for a single entry, with many other matches obtained from others.
Dynamic NOR match lines discharge on a mismatch, resulting in high match line activity factor, as most entries don't match, which leads to high power dissipation. Series transistor connected (NAND) match lines can reduce power, but these large stacks invite charge sharing issues. These can in turn, be addressed by using a hierarchy of short stacks or pre-charging the intermediate nodes. Multiple TCAM chips, dissipating up to 15 W each, are required in a high end router.
The conventional TCAM requires finding the longest match by finding the match closest to the bottom of the lookup table, and this is similar to leading zeros detection (it searches for the bottommost logic 1). One such approach is a multi-level look-ahead design using domino logic. These designs are complicated in that the signals cascading from one stage to the next must be domino compatible (monotonic) and impose large clock loading.
Reference TCAM
To provide meaningful power, density and speed comparisons, a reference TCAM array implemented in the same bulk CMOS 65 nm process technology is used in this work. The cell design is shown in FIG. 2(a). Although other cell designs are denser (see FIG. 2(b)), the design in FIG. 2(a) has the least match line capacitance and thereby, lower search power dissipation. It consists of two SRAM cells storing the address and mask bits, respectively. 32 cells, combined with a pre-charge, keeper and latch block comprise one row in an array for address comparison. The basic TCAM block has up to 31 entries for a 32-bit IPv4 address.
For simplicity and clarity of illustration, the drawing figures illustrate the general manner of construction, and descriptions and details of well-known features and techniques may be omitted to avoid unnecessarily obscuring the invention. Additionally, elements in the drawing figures are not necessarily drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help improve understanding of embodiments of the present invention. The same reference numerals in different figures denote the same elements.
The terms “first,” “second,” “third,” “fourth,” and the like in the description and in the claims, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms “include,” and “have,” and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, device, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, system, article, device, or apparatus.
The terms “left,” “right,” “front,” “back,” “top,” “bottom,” “over,” “under,” and the like in the description and in the claims, if any, are used for descriptive purposes and not necessarily for describing permanent relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in other orientations than those illustrated or otherwise described herein.
The terms “couple,” “coupled,” “couples,” “coupling,” and the like should be broadly understood and refer to connecting two or more elements or signals, electrically, mechanically and/or otherwise. Two or more electrical elements may be electrically coupled together but not be mechanically or otherwise coupled together, two or more mechanical elements may be mechanically coupled together, but not be electrically or otherwise coupled together; two or more electrical elements may be mechanically coupled together, but not be electrically or otherwise coupled together. Coupling may be for any length of time, e.g., permanent or semi-permanent or only for an instant.
An electrical “coupling” and the like should be broadly understood and include coupling involving any electrical signal, whether a power signal, a data signal, and/or other types or combinations of electrical signals. A mechanical “coupling” and the like should be broadly understood and include mechanical coupling of all types. The absence of the word “removably,” “removable,” and the like near the word “coupled,” and the like does not mean that the coupling, etc. in question is or is not removable.