A prefix search is used in networking to route and classify packets. The route to be used for a packet and its classification are determined by finding the longest matching prefix in a set. For example a packet using IPv6 (internet protocol version 6) has a 128-bit destination address. A router determines the output port over which such a packet should be routed by searching a set of variable-length binary strings to find the longest string that matches a prefix of the destination address. For classification purposes, other fields of the header, such as the port number, may also be included in the string to be matched.
To illustrate the problem of prefix search, consider the list of prefix character strings shown in FIG. 1 in alphabetical order. The principle is the same with binary strings. Given a search string, such as xe2x80x9ccaceaxe2x80x9d, the goal is to find the longest stored string that exactly matches a prefix of this string. Although a simple linear search of the list finds that this string falls between xe2x80x9ccabxe2x80x9d and xe2x80x9ccadxe2x80x9d, one must scan several strings backward from this point to find that the longest matching prefix is xe2x80x9ccaxe2x80x9d In actual routing tables, which may contain hundreds of thousands of entries, the matching prefix may be far from the point where the linear search fails. An optimized data structure is needed to efficiently find the matching prefix.
A prior method for performing longest prefix matching employs a data structure called a trie. A trie for the prefix list of FIG. 1 is shown in FIG. 2. As shown, the trie is a tree structure in which each node of the tree resolves one character of the string being matched. Each internal node consists of a list of characters. Associated with each character is an outgoing link either to another internal node, a rectangle in the figure, or to a leaf node, a circle in the figure. A slash at the start of a node indicates that a prefix leading to that node with no additional characters is part of the list. Each leaf node holds the result data associated with the prefix leading to that leaf node, and in the figure, the leaf nodes are labeled with these prefixes. The result data might, for example, be the output port associated with a data packet and a flow-identifier.
To search the trie, one starts at the root node, node 1 in the figure, and traverses the tree by following the outgoing link at each node corresponding to the next character in the string to be matched. When no matching outgoing link can be found, the longest matching prefix has been found. For example, given the string xe2x80x9ccaceaxe2x80x9d we start at node 1. The xe2x80x9ccxe2x80x9d directs us to node 4. The xe2x80x9caxe2x80x9d directs us to node 8. As we cannot find a match for the next character, xe2x80x9ccxe2x80x9d, at node 8, we follow the link associated with the slash to the leaf node associated with the longest matching prefix, xe2x80x9ccaxe2x80x9d. Note that if prefix xe2x80x9ccaxe2x80x9d were not in the list, we would need to backtrack at this point to node 4 for prefix xe2x80x9ccxe2x80x9d.
Another prior method for prefix matching is to perform binary search on a table. However, as described by Radia Perlman, Interconnections, Bridges and Routers, Addison Wesley, 1992, pages 233-239, and shown in FIG. 3, since binary search will find the closest matching string, rather than the longest matching prefix, we must make two modifications to the list to apply this technique. First, we insert two entries for every entry in the list that encloses other entries, that is, that would serve as a longest matching prefix for another prefix in the list but for the other prefix itself being in the list. One of those entries is terminated by the symbol 0, which comes alphabetically before all characters, and one by the symbol 1, which comes alphabetically after all characters. These two entries act as parentheses enclosing all entries that contain the prefix. Second, we attach to each entry in the list not ending in a 0 a pointer to the nearest enclosing entry. FIG. 3 shows the list of FIG. 1 augmented in this manner. Note that the prefix xe2x80x9ccaxe2x80x9d has been replaced by the two entries xe2x80x9cca0xe2x80x9d and xe2x80x9cca1xe2x80x9d that bracket all entries containing the prefix xe2x80x9ccaxe2x80x9d and that all of these entries have a pointer back to xe2x80x9cca0xe2x80x9d.
To search the augmented list of FIG. 3 for the longest matching prefix, one searches for a string equal to a prefix of the target or the alphabetically closest pair of strings. Strings ending in xe2x80x9c0xe2x80x9d or xe2x80x9c1xe2x80x9d never exactly match a prefix of the target string because xe2x80x9c0xe2x80x9d and xe2x80x9c1xe2x80x9d do not match any character of the target string. If the search finds an exact prefix of the target string, the result data associated with the string is retrieved. Otherwise, the search found the closest pair of stored strings, Sa and Sb. In this case there are three possibilities:
1. If Sa ends in a xe2x80x9c0xe2x80x9d symbol, then the longest matching prefix is this string with the xe2x80x9c0xe2x80x9d removed.
2. If Sb ends in a xe2x80x9c1xe2x80x9d symbol, then the longest matching prefix is this string with the xe2x80x9c1xe2x80x9d removed.
3. Otherwise, an enclosing pointer from Sa is followed to find a string ending in a xe2x80x9c0xe2x80x9d symbol which encloses Sa and the nearest match is that string with the xe2x80x9c0xe2x80x9d symbol removed.
For example, a search for xe2x80x9ccaceaxe2x80x9d will end between xe2x80x9ccabxe2x80x9d and xe2x80x9ccadxe2x80x9d. Since this is not an exact match, xe2x80x9ccabxe2x80x9d does not end in xe2x80x9c0xe2x80x9d, and xe2x80x9ccadxe2x80x9d does not end in xe2x80x9c1xe2x80x9d, the pointer from xe2x80x9ccabxe2x80x9d is followed back to xe2x80x9cca0xe2x80x9d giving the longest matching prefix, xe2x80x9ccaxe2x80x9d. Similarly a search for xe2x80x9ccbxe2x80x9d will end between xe2x80x9cca1xe2x80x9d and xe2x80x9cccxe2x80x9d and follow the pointer from xe2x80x9cca1xe2x80x9d back to the common prefix, xe2x80x9ccxe2x80x9d.
While the trie structure and binary search strategy work, they are not well suited for implementation in a hardware search engine. The trie requires a memory access for every character of a string and possible backtracking if a match is not found. This makes it inefficient in terms of memory bandwidth usage. The binary search strategy requires storing two result pointers for the majority of prefixes, one for a direct match and one to the enclosing string or its associated result. This makes it inefficient in terms of memory usage.
The present invention relies on a data structure, an augmented tree, that stores prefix sets in a manner that enables efficient searching and a hardware engine for searching the augmented tree. The augmented tree stores the prefix set with enclosing prefixes in a tree structure similar to a B-tree, a tree with a radix greater than one previously used to efficiently search for exact matches by optimizing the tree node size to the size of data blocks retrieved from storage discs.
In accordance with the invention, a prefix search data structure comprises a tree structure having internal nodes for identifying subsequent nodes from prefix search keys. Leaf nodes each comprise a set of prefix keys to be compared to a prefix search key. The sets of prefix keys of plural leaf nodes together form a list of prefix keys including enclosing prefix key pairs.
In preferred embodiments, the internal nodes include partitioning nodes, each comprising a set of prefix keys to be compared to a prefix search key. Prefix keys from one node point to subsequent nodes through a common pointer and indexes. Each internal node may also comprise a node-size parameter that defines the size of that node and a child-node-size parameter that defines the size of fixed size child nodes to which the common pointer of the internal node points.
Each of plural leaf nodes may comprise a single enclosing pointer associated with an enclosing prefix key which encloses a prefix key of the leaf node. Preferably, a common pointer of each leaf node is a result pointer to a block of results for the node, and the enclosing pointer points to a result in a block of results for another node. Each leaf node may comprise a node-size parameter that defines the size of the leaf node as well as the size of the results block.
Individual nodes can be sized to optimize memory timing. In particular, the data retrieved for any node may correspond to the size of the data block most efficiently retrieved from memory storage. Preferably, each of the nodes identifies a middle prefix key and low and high sets of prefix keys. Each node can be laid out to optimize memory bandwidth by storing the middle prefix key ahead of the low and high prefix keys so that the upper or lower set of strings in the node can be conditionally fetched based on a comparison of the middle prefix. Within each low and high set of search keys, middle prefix keys may be further identified with low and high sets of prefix keys.
The data structure results in significant savings in both storage and bandwidth compared to previous methods for storing prefix sets. To further reduce storage, the stored prefix keys of select nodes may be only portions of the full prefixes. Each of the select nodes may include an indication of the portion of the prefix search key to be compared to the prefix keys. Further, one or more tree nodes may comprise a table indexed by a prefix search key rather than a list of partitioning strings. A table may also be indexed by only a portion of the prefix search key.