The present invention relates generally to data communications, and more particularly to systems and methods of routing or monitoring network data according to routing rules.
In connectionless networks like the Internet, each router along the packet path from a source must make an independent decision on how to forward a packet closer to its destination. Each such router and switch must also make an independent decision on how to allocate its resources (buffer memory, fabric capacity, link bandwidth, etc.) when faced with competition among packets for these resources. A system makes this decision by examining the initial portion (xe2x80x9cheaderxe2x80x9d) of the packet, and from fields contained within the header, determining the appropriate local action for the packet.
The speed (i.e., data throughput) of a router can be limited by one of many factors. A first factor is the data interface of the device. The data interface is that portion of the device that is connected to the data transmitting media. Presently, with fiber optic and other technologies, data transfer rates of 10 gigabits per second (Gbits/s) and greater can be achieved. A second factor is the xe2x80x9cswitch matrixxe2x80x9d within a router. The switch matrix enables the physical data path between the interfaces of a device, and typically includes a number of integrated circuits (chips) connected by one or more buses. Advances in scheduling algorithms (to determine priority and timing of data paths) as well as improvements in interconnect technology (both chip-to-chip and on-chip) currently allow for transfer rates of hundreds of Gbits/s. However, this high data transfer rate given for a switch matrix assumes that router has already determined the correct output port for the data packet. This leads to a third, and arguably the most difficult factor involved in scaling up router throughput: the lookup engine.
The lookup engine searches a table of xe2x80x9crulesxe2x80x9d which define sets of possible values for the various header fields of interest. (What we call a xe2x80x9crulexe2x80x9d is also known variously as a route (when it specifies destination-prefix only), filter (in a firewall environment), traffic class, aggregation rule, etc.). These rules are derived from routing protocols, manual configuration, or other means. While a single rule may specify an exact packet header, more typically it specifies a range of values. The lookup engine searches the table, using the packet header information as a search key. The result of the search (xe2x80x9cassociated dataxe2x80x9d) tells the system where and how to forward the packet.
Were destination addresses all uniform, the lookup operation would require a simple xe2x80x9cexact-matchxe2x80x9d search, for which there are many well-known approaches. Unfortunately, this is not the case. Some protocols, such as the Internet protocol (IP), require xe2x80x9clongest prefix matchingxe2x80x9d or xe2x80x9cmasked prefix matchingxe2x80x9d lookup operations.
The problem of longest prefix matching may be best understood by example. Accordingly a two rule example for a 32-bit address is given below. Addresses are written by separating xe2x80x9coctetsxe2x80x9d (8-bit data sections) by periods. Thus, the addresses 192.9.0.0 is equal to (11000000 00001001 00000000 00000000). The two routing rules are
The first rule requires that the first 16-bits of the 32-bit search key be examined, and if they are equal to 192.9, the packet will be forwarded to a next hop (port or interface) identified as A. However, within the same address range is a subset of the search key that includes the same first 16-bits as the first rule, but further includes ten more bits for comparison. Thus, the second rule requires that the first 26-bits of the search key to be examined. If the first 26-bits is equal to 192.9.4.0, the packet will be forwarded to the next hop identified as B. IP and IPX routing, as well as other applications, require that the longest prefix match (26-bit in the example) take precedence over a shorter prefix match (16-bit in the example). Thus, a longest prefix matching capability is a necessary function for these router applications.
The masked-prefix matching problem is a generalization of longest-prefix matching. In longest-prefix matching, each rule xe2x80x9cexaminesxe2x80x9d a contiguous set of bits in the search key, starting from the most-significant bit. The 192.9/16 rule above, for example, examines the high 16 bits of the search key and tests against the value 192.9. In masked prefix matching, each rule examines a subset of the bits in the search key, but the subset does not have to be contiguous, nor does it have to begin at the most-significant bit. For example, a masked-prefix rule may take the form
192.9.xc3x97.0/26xe2x86x92Next hop xe2x80x9cCxe2x80x9d
which would examine the first 16-bits, ignore the next 8-bits, and examine and the next 2-bits of the search key. A match is successful only if the first 16-bits are equal to 192.9 and the 25th and 26th bits are equal to 00. Any don""t-care bits within the prefix we call xe2x80x9cgapsxe2x80x9d in the prefix. An ordinary longest-prefix match does not permit gaps. Note that although the above example shows a gap of exactly 8 bits and aligned on an 8-bit boundary, in a masked-prefix rule gaps or don""t care bits may appear in any bit position(s) within the prefix.
A number of approaches have arisen to address the longest prefix matching problem; some work, though much less, has been done on masked prefix matching. A common technique, because of the complexity of longest prefix and masked prefix searches, involves caching the results of recent complex lookup operations in an exact-match search table, such as a binary content addressable memory (CAM) or a hash table. If a packet header arrives that exactly matches one of the cached lookup operations, there is a cache hit, and the complex lookup operation for that particular destination address is avoided. Such an approach can be useful where the number of different addresses handled by the router is limited. However, in the event the router must handle a large number of addresses, the use of cached longest prefix match results is not practical. Further, this approach exhibits xe2x80x9ctraffic dependencyxe2x80x9d: its performance is dependent on the assumption that the router will need to look up the a small number of unique packet header values over and over again, which is not true for many traffic patterns. Finally, although caching can reduce the load on the longest prefix matching or masked prefix matching system, it still relies on an underlying complex search for any header which is not found in the cache, so the problem of fast hardware searching is still present.
A second technique that relies on specialized hardware involves the use of specialized memory circuits. Such memories are typically variations on standard CAMs, and include internal circuitry capable of performing a longest prefix match operation or other masked searches. Such specialized CAMs have drawbacks, however, in that they can be more expensive than commodity memory devices, consume more power, and are limited in their density. In addition, because of specialized semiconductor process requirements, such CAM variations can be difficult to integrate with logic circuits, in the event the CAM variation is to be xe2x80x9cembeddedxe2x80x9d to form a single integrated circuit.
Other prior art approaches in the literature include xe2x80x9csoftware-orientedxe2x80x9d and xe2x80x9chardware-orientedxe2x80x9d search algorithms.
Software-oriented algorithms are designed to run on conventional computer system platforms. xe2x80x9cPatriciaxe2x80x94a practical algorithm to retrieve information coded in alphanumericxe2x80x9d Journal of the ACM, v15, #4, October 1968, pp. 515-534, by Morrison, discloses an algorithm (named xe2x80x9cPatriciaxe2x80x9d) that is utilized by many routers, in one variation or another, for performing the lookup function. A drawback to the Patricia algorithm is the number of memory accesses that are required for the system running the algorithm. In a worst-case lookup, 32 memory accesses are required. For the average lookup case, between five and ten memory accesses are required. Large numbers of memory accesses can add considerable time to the routing function, and are therefore undesirable.
More recent software algorithms have been developed that provide improvements over the Patricia algorithm. Four such algorithms are disclosed in xe2x80x9cSmall Forwarding Tables for Fast Routing Lookups,xe2x80x9d Proceedings of the 1997 ACM SIGCOMM Conference, Cannes, France, 1997 by Brodnick et al., xe2x80x9cRouting on Longest-Matching Prefixes,xe2x80x9d IEEE/ACM Transactions on Networking, v4, #1, February 1996, pp. 86-97 by Doeringer et al. (and the related U.S. Pat. No. 5,787,430: xe2x80x9cVariable length data sequence backtracking a trie structurexe2x80x9d), xe2x80x9cIP Lookups using Multiway and Multicolumn Search,xe2x80x9d IEEE INFOCOM ""98 Proceedings v3 pp. 1248-1256 by Lampson et al., and xe2x80x9cScalable high speed IP routing lookups,xe2x80x9d Proceedings of the 1997 ACM SIGCOMM Conference, Cannes, France, 1997, by Waldvogel et al. These algorithms are optimized for software implementation with general purpose processors, and rely on the pre-processing of routing tables to reduce the number of memory accesses required for each lookup operation. As a result, these more recent algorithms are useful in traditional software based routers, in which a central processing unit (CPU) examines and forwards each packet that is received. However, faster routers utilize application specific integrated circuits (ASICs) that are custom designed to perform the forwarding function, and leave only routing table updates and error processing to a general purpose CPU. While the above-referenced algorithms can be implemented in hardware, they are not well suited for such an approach, as they rely on capabilities specific to general purpose CPUs, such as complex CPU caches.
Algorithms optimized for implementation in hardware have been developed. One such algorithm is disclosed in xe2x80x9cRouting Lookups in Hardware at Memory Access Speeds,xe2x80x9d IEEE INFOCOM ""98 Proceedings v3 pp. 1240-1247 by Gupta et al. Gupta et al. relies on a large memory size of 56 to 264 megabits, and has poor table-update performance. It also works best with tables of short prefixes; performance is worse when many prefixes in the table are long. In addition to the limitations above, neither the software-oriented nor the hardware-oriented algorithms referenced above can handle masked-prefix searches.
Due to the cost and time required to execute longest prefix match operations, it would be desirable to find some way of performing a longest prefix matching operation that does not require a large number of memory accesses. Such a solution would provide a significant improvement in lookup engine operations, and thereby improve the overall data throughput of a router or bridge.
It would also be desirable to have an method for longest prefix match operations that provides advantages over prior art methods, but can still be efficiently implemented into hardware. Such a method could then be advantageously used in current high-speed router architectures that do not use a general purpose processor for the lookup function.
It is also desirable to arrive at a high-speed solution to the longest prefix matching problem that utilizes as small a memory structure size as possible. In addition, because embedded memory circuits (such as DRAMs or SRAMS) can result in very high speed memory accesses, it would also be desirable to arrive at a fast solution to the largest prefix lookup problem, that is also amenable to being implemented in an integrated circuit having embedded memory.
In addition to performance, size and implementation concerns, another important aspect of a lookup operation is the time required to update a data structure containing the routing rules. In prior art approaches, table updates can require long xe2x80x9cstallsxe2x80x9d (interruptions in routing lookups) when the updated data structure is written to the lookup engine memory. One way to limit stalls is to utilize a xe2x80x9cdouble bufferedxe2x80x9d system. A double buffered system maintains two copies of a data structure, one used for lookups, the other for updates. Periodically, the functions of the two copies are switched. Such an approach obviously doubles the amount of memory necessary for the lookup table. It also requires significant additional memory bandwidth, with attendant complexity and chip pin count, in order to write one copy without interrupting the reads of the other table.
It would be desirable to provide a longest prefix matching approach that allows for rapid updates to the data structure used by a lookup engine, and yet does not unduly increase the overall system memory size, or complexity of the lookup engine memory interface.
Finally, it would be desirable to provide a lookup engine that can support both longest prefix matching and masked prefix matching, without the high cost and high power consumption of CAM memories.
According to a first embodiment, a lookup engine receives a search key, and after identifying the best matching prefix or masked-prefix rule for the search key, provides an output value. The first embodiment may include three independent memory arrays that may be accessed in a pipelined fashion, allowing one search operation to be completed on each operational cycle. A first 16-bit portion of a search key may be applied to a first array, which may provide an output value when only prefixes less than or equal to 16-bits are in the possible result set, or a pointer value to a second array, when prefixes greater than 16-bits are in the possible result set. A second 6-bit portion the search key may be applied to the second array, which may provide an output value when only prefixes less than or equal to 22-bits are in the possible result set, or a pointer value to a third array for when prefixes greater than 22-bits are in the possible result set. A third 10-bit portion of the search key may be applied to the third array, which may provide an output value when prefixes greater than 22-bits are in the possible result set.
The data structure stored within the first, second and third arrays, has a novel compact structure, allowing conventional memory devices to be used as storage elements. Alternate embodiments, optimized for implementation in software and a corresponding data structure are also disclosed, along with approaches to updating the data structure and to managing the memory that stores the array values.
According to one aspect of the embodiments, the values within the second and third arrays are arranged in compact xe2x80x9cchunks,xe2x80x9d each of which may include a number of entries. The layout of each chunks within the second array may be summarized by a code value contained in the corresponding first array pointer entries, allowing for compact second array pointer entries. Similarly, the layout of each chunk within the third array may be summarized by a code value contained in the second array pointer entries, allowing for compact third array pointer entries.
According to another aspect of the first embodiment, pointer values to the second array include reduced bit length code pointer values. Code pointer values may be applied to a code dictionary which provides a longer bit length code value. The code value may summarize the arrangement of second array chunks.
According to another aspect of the first embodiment, code values are merged within the code table to create a more compact code table.
According to another aspect of the first embodiment, entries within the third array are accessed by information contained in pointer values within both the first array and second array.
According to another aspect of the first embodiment, the second array includes a number of sub-arrays. Within each sub-array addresses may be indexed by the second portion of the destination address, and which sub-array to enable is selected according to bits in the first portion of the destination address.
According to one aspect of the embodiments, updates to the compact data structure can be accomplished with minimal interruptions to the routing functions.
An advantage of the embodiments is that throughput of one lookup operation per memory access may be accomplished, providing speed advantages over the cited prior art approaches.
Another advantage of embodiments is that they may provide small data structures to accomplish the rapid lookup function, and thus can utilize inexpensive static random access memories (SRAMs), dynamic random access memories (DRAMs), or embedded memories
An advantage of one embodiment is that it may provides a single high-speed lookup solution to the longest prefix matching problem, and so may be used to provide a single replacement for those routers that include both a xe2x80x9cfast pathxe2x80x9d and xe2x80x9cslow path.xe2x80x9d
An advantage of the first embodiment is that it may be optimized for implementation into hardware and can be easily integrated with other logic.
Another advantage of the first embodiment is that is may provide rapid masked prefix matching operations.