Many real world systems require searching information at very high speeds; hence hardware based approaches are often employed. An increasingly common example, and the one which will primarily be used herein, is searching in communication and networking systems. In information searching, there are basically three desired types of searching: exact match, longest prefix match, and partial match searching.
Of present interest is longest prefix match (LPM) searching. A typical application of this is in Internet protocol (IP) based routing. Two approaches widely used today for longest prefix match searching are content addressable memory (CAM) based searching, and algorithm-based searching.
FIG. 1 (background art) is a block diagram showing a search system 1 that uses a search engine, as might be found in a typical internetworking configuration today. A controller 2 sends a search key 3 to a search engine 4. In response to that the search engine 4 provides a corresponding match address 5, which is used to read an associated content (AC)(or associated data) from an AC memory 6. This AC is then returned to the controller 2 as a search result 7. In LPM searching, there may be many entries matching the given search key 3 and, in this case, the entry with the longest matching prefix is chosen.
FIG. 2 (background art) is a table of example search data for the search engine 4 described in FIG. 1. In this example, the maximum length of the search key 3 (i.e. the search key width) is eight bits. The table contains 16 entries. The symbol “*” is used here to indicate that all of the subsequent bits are do not care cases; in other words, the bits to the left of the “*” indicate the prefix that should be used. If the search key is “00001011, ”then two entries match—the first and second entries. The first entry matches two prefix digits, while the second entry matches four prefix digits. So, the longest prefix matching entry is the second entry, and its corresponding address stores an AC with the value B.
For implementation of LPM database tables, CAM would appear to provide a good high speed solution, since the hardware inherently performs a simultaneous comparison of all table entries for the search key. However, as is widely appreciated by those skilled in the art, CAM can only be used for small search spaces because of its high power dissipation and expense, and scalability problems.
Algorithm-based searching is a currently used alternate approach. It is mostly based on a data structure called the “trie.” The trie is a radix search tree. The idea is very simple: a leaf in a tree structure represents a particular prefix and the value of this prefix corresponds to the path from the root of the tree to the leaf.
Consider a small example. FIG. 3 (background art) depicts a trie corresponding to the binary strings (search key 3) in FIG. 2. In particular, the string 010 corresponds to the path starting at the root and ending in a leaf at the third level: first a left-turn (0), then a right-turn (1), and finally a turn to the left (0). This simple structure is not very efficient. The number of nodes may be large and the average depth (the average length of a path from the root to a leaf) may be long.
A traditional technique to overcome this problem is to use path compression. Each internal node with only one child is removed, and stores two numbers, the “skip count” and the “skip value,” which indicate how many bits and the value that have been skipped on the path, respectively. A path-compressed binary trie is sometimes referred to as a Patricia tree.
FIG. 4 (background art) shows a Patricia trie. It is the most common approach currently used for address table lookup. For instance, it is used in the FreeBSD Unix kernel (developed at the University of California, Berkeley). The total number of nodes in the trie is 2n−1, where n is the number of leaves in the O(n), where n is the number of bits in a search string. If one lookup per memory access time is desired, pipelining is therefore needed, and each level needs to be stored in a separate memory. However, in the case of IPv4 (Internet protocol, version four), for example, this will lead to a wasteful 32-stage pipeline for the 32-bit string searching required.
FIG. 5 (background art) shows another trie scheme which may be used to speed up lookups, the “expanded multi-bit trie.” Continuing with the example data in FIG. 2, a multi-bit trie that examines addresses two bits at a time and stores the first five prefixes is shown in FIG. 5. Each array has four locations, each of which can store a prefix, and optionally also contain a pointer to another trie node. Thus the 00 entry in the root node stores the prefix P1=00*, and also points to all prefixes that start with 00. All prefixes must be expanded to lengths that are multiples of two. Thus the prefix P4=00101* here necessarily expands to two 6-bit prefixes 001010* and 001011*, and is stored in the third and fourth locations of the right-most array. Similarly, the prefix P5=010* expands to two 4 bit prefixes 0100*, and 0101*, and is stored in the first two locations of the lower-left-most array.
It is important to note that only a prefix entry with the longest prefix is stored in each trie node. So, the insertion of a single prefix may result in the need to either write or update many array locations. Similarly, the removal of a single prefix may also result in many entries having to be deleted or updated, in order to restore the information of a shorter prefix. This process, which can be quite demanding, usually burdens the CPU (if managed by software) or requires complicated logic (if managed by hardware). In addition, like before, each layer of the trie will need to be implemented in a different physical memory if the design is to be pipelined for lookups in a single memory access time. This leads to problems because the memory cannot be dynamically shared; it could happen that a single layer of the trie could exhaust its memory while other layers still have free space. Finally, with this approach, the total number of elements that can be stored in the table is not capacity deterministic. It depends on the distribution of prefixes and the total memory size.
FIG. 6 (background art) shows an example implementation of a direct memory lookup scheme. Again, the controller 2 (FIG. 1) sends the search key 3 to the search engine 4, the search engine 4 sends a match address 5 to the AC memory 6, and an AC entry is retrieved from the AC memory 6 and returned as the search result 7 to the controller 2.
The search engine 4 here particularly contains a primary table 8 and a secondary table 9. The highest n bits of the search key 3 are used as an index into the primary table 8 (typically implemented in RAM) to retrieve the match address 5, and the secondary table 9 (in RAM or CAM) is used to store the prefixes with lengths longer than n. In this case, the primary table 8 may look like an expanded 6-bit trie node. Within it, for each 6-bit value, only the prefix with longest prefix length is stored. If the longest prefix length for the location is less then six, it must “expand” and fill out the prefix information into neighboring locations just like the “expanded multi-bit trie” does. Thus, the same table maintenance issue as found in trie-based searching is evident. In fact, since this approach inherits the properties of an expanded 6-bit trie, on average, it needs more entry updates when performing maintenance, compared with expanded multi-bit tries.
In summary, all existing schemes have problems of performance, scalability, generality, cost, or maintenance. Lookup schemes based on trie and binary search are too slow and do not scale well; present CAM based solutions are expensive; and direct memory lookups are not easy to maintain.
Accordingly, it should be apparent that there is a need for a scheme which is cheap, deterministic for the worse case, high speed, supports large databases, and is easily maintained by hardware circuits.