A prefix search is used in networking to route and classify packets. The route to be used for a packet and its classification are determined by finding the longest matching prefix in a set. For example a packet using IPv6 (internet protocol version 6) has a 128-bit destination address. A router determines the output port over which such a packet should be routed by searching a set of variable-length binary strings to find the longest string that matches a prefix of the destination address. For classification purposes, other fields of the header, such as the port number, may also be included in the string to be matched.
To illustrate the problem of prefix search, consider the list of prefix character strings shown in FIG. 1 in alphabetical order. The principle is the same with binary strings. Given a search string, such as “cacea”, the goal is to find the longest stored string that exactly matches a prefix of this string. Although a simple linear search of the list finds that this string falls between “cab” and “cad”, one must scan several strings backward from this point to find that the longest matching prefix is “ca” In actual routing tables, which may contain hundreds of thousands of entries, the matching prefix may be far from the point where the linear search fails. An optimized data structure is needed to efficiently find the matching prefix.
A prior method for performing longest prefix matching employs a data structure called a trie. A trie for the prefix list of FIG. 1 is shown in FIG. 2. As shown, the trie is a tree structure in which each node of the tree resolves one character of the string being matched. Each internal node consists of a list of characters. Associated with each character is an outgoing link either to another internal node, a rectangle in the figure, or to a leaf node, a circle in the figure. A slash at the start of a node indicates that a prefix leading to that node with no additional characters is part of the list. Each leaf node holds the result data associated with the prefix leading to that leaf node, and in the figure, the leaf nodes are labeled with these prefixes. The result data might, for example, be the output port associated with a data packet and a flow-identifier.
To search the trie, one starts at the root node, node 1 in the figure, and traverses the tree by following the outgoing link at each node corresponding to the next character in the string to be matched. When no matching outgoing link can be found, the longest matching prefix has been found. For example, given the string “cacea” we start at node 51. The “c” directs us to node 54. The “a” directs us to node 58. As we cannot find a match for the next character, “c”, at node 58, we follow the link associated with the slash to the leaf node associated with the longest matching prefix, “ca”. Note that if prefix “ca” were not in the list, we would need to backtrack at this point to node 54 for prefix “c”.
Another prior method for prefix matching is to perform binary search on a table. However, as described by Radia Perlman, Interconnections, Bridges and Routers, Addison Wesley, 1992, pages 233-239, and shown in FIG. 3, since binary search will find the closest matching string, rather than the longest matching prefix, we must make two modifications to the list to apply this technique. First, we insert two entries for every entry in the list that encloses other entries, that is, that would serve as a longest matching prefix for another prefix in the list but for the other prefix itself being in the list. One of those entries is terminated by the symbol 0, which comes alphabetically before all characters, and one by the symbol 1, which comes alphabetically after all characters. These two entries act as parentheses enclosing all entries that contain the prefix. Second, we attach to each entry in the list not ending in a 0 a pointer to the nearest enclosing entry. FIG. 3 shows the list of FIG. 1 augmented in this manner. Note that the prefix “ca” has been replaced by the two entries “ca0” and “ca1” that bracket all entries containing the prefix “ca” and that all of these entries have a pointer back to “ca0”.
To search the augmented list of FIG. 3 for the longest matching prefix, one searches for a string equal to a prefix of the target or the alphabetically closest pair of strings. Strings ending in “0” or “1” never exactly match a prefix of the target string because “0” and “1” do not match any character of the target string. If the search finds an exact prefix of the target string, the result data associated with the string is retrieved. Otherwise, the search found the closest pair of stored strings, Sa and Sb. In this case there are three possibilities:
1. If Sa ends in a “0” symbol, then the longest matching prefix is this string with the “0” removed.
2. If Sb ends in a “1” symbol, then the longest matching prefix is this string with the “1” removed.
3. Otherwise, an enclosing pointer from Sa is followed to find a string ending in a “0” symbol which encloses Sa and the nearest match is that string with the “0” symbol removed. For example, a search for “cacea” will end between “cab” and “cad”. Since this is not an exact match, “cab” does not end in “0”, and “cad” does not end in “1”, the pointer from “cab” is followed back to “ca0” giving the longest matching prefix, “ca”. Similarly a search for “cb” will end between “ca1” and “cc” and follow the pointer from “ca1” back to the common prefix, “c”.