This invention relates to a method for storing and managing data. More particularly this invention relates to a data structure representing a set of prefixes and a method for searching the data structure. The data structure and the search method are particularly advantageous for implementation in digital computer hardware. The primary application of current interest is to semiconductor integrated circuits used for packet classification in high-speed, multiservice Internet routers. However, the technique may be useful in a variety of applications involving data that needs to be prioritized or wherein structure in the data needs to be determined and then to be classified. As a result of classification of the data, action on data can be taken more quickly and efficiently.
In the past, in the packet-switched networking field, Internet Protocol (IP) traffic handling processes have required relatively complex searches and analyses of packet header information to develop routing and processing instructions. Recently, IP destination address lookup issues relating to techniques for routing a packet to a destination have received a great deal of attention from the networking community. The following references may be consulted on the state of the art related to the invention herein described:
[1] M. Degermark, A. Brodnik, S. Carlsson, S. Pink, xe2x80x9cSmall Forwarding Tables for Fast Routing Lookupsxe2x80x9d Proc. ACM SIGCOMM ""97, Cannes (Sep. 14-18, 1997) (describing the Lulea algorithm).
[2] V. Srinivasan, G. Varghese, xe2x80x9cFaster IP Lookups using Controlled Prefix Expansion,xe2x80x9d Proc. SIGMETRICS ""98.
A number of patents of interest on lookup algorithms are known in the art.
It is helpful to understand the background of the closest known method for searching tree structures of data. Note the following definitions:
A trie is a tree structure organized for searching.
A trie element is that portion of a tree structure at a single node.
A prefix is a string of characters that appears at the beginning of a longer string of characters.
A stride is a number of levels in a tree accessed in a single read operation.
A target string is a string of characters to be classified.
In many cases of practical interest the characters in a prefix are binary digits, i.e., ones and zeroes. In the example prefix database 10 in FIG. 1 there are nine binary prefixes 11-19 each terminated by *, a symbol which represents the remaining arbitrary binary digits in a longer, fixed-length string. If the total length of the string is k digits, then P1 represents any string of k arbitrary binary digits. Similarly, P3 represents a k-bit string that begins with 00 and is followed by any kxe2x88x922 length string of binary digits.
A prefix database can be used to classify strings of characters into subsets according to which prefix a particular string matches. The prefix database structure can include exact matches with a target string, i.e., the data to be analyzed can appear in the database.
The matching process is ambiguous unless the classification requires that the matching prefix be the longest possible match in the database. For example, in FIG. 1, the 10-bit string 1000011000 matches P1, P2, P6 and P9, but only P9 is the longest, matching the first 7-bits of the 10-bit string.
A database of binary prefixes can be represented by a binary tree 20 specialized for searching. This search tree 20 is an example of a data structure called a trie and is convenient for finding the longest prefix match. Nodes (shown as small circles in FIG. 1) in the trie are labeled as Prefix Nodes or Vacant Nodes according to whether they represent a prefix or not.
The search for a match to a target string progresses by examining one bit of the target at a time. Before examining the target""s first bit, a match with the Prefix Node 21 for P1 is known to exist, since P1 matches all strings. If the first bit is a 0, take the left branch 22 of the trie. If the first bit is a 1 take the right branch 23 of the trie. In either case, determine if a Prefix Node is encountered. In the example of FIG. 1 taking right branch 23 leads to a match with the Prefix Node 25 for P2.
The process proceeds by taking the left or right branch according to whether the next bit is a 0 or a 1, noting the Prefix Nodes encountered and tracing a path down the trie until either a final node 26-30 is reached (a leaf) or there is no longer a path on the trie (since the node is not in the database). In either case, the last Prefix Node found represents the longest prefix match.
The Lulea algorithm is the algorithm known to encode prefix tables most compactly. The Lulea algorithm is a scheme that uses multibit trie elements containing bitmaps to minimize wasted storage. The Lulea scheme does well in terms of worst case storage, using only 200 Kbytes of memory to store the current MAE East database in a fully structured form. (The MAE East database is a database that contains over 40,000 Internet prefixes for IP addresses and requires over 900 Kbytes of ASCII character storage in its uncompressed form.) One way to describe the Lulea scheme is to consider the database 10 of FIG. 1 remapped to the multibit trie elements 102, 104, 106, 108 as in FIG. 2A.
The various possible target strings are shown in FIG. 2A as 3-bit strides at the left of each of the trie elements 102, 104, 106 and 108. The longest matching prefix at each stride is shown in the prefix column of the trie elements in FIG. 2A. For example, the target string 111001 traces a path down the right hand side of the trie 20 in FIG. 1 to the leaf 30 and the prefix P8. In FIG. 2A the leading 111 portion of the target string selects the last entry in the trie element 102 which contains a pointer to trie element 108 where the last three bits of the target string 001 select the prefix P8.
This remapped trie 100 is called a controlled prefix expansion and is subject, in FIG. 2B, to an optimization known as xe2x80x9cleaf pushing.xe2x80x9d Leaf pushing is intended to reduce memory requirements by making each trie element entry contain EITHER a pointer OR a prefix label but not both. Thus entries like xe2x80x9c100xe2x80x9d (binary) in the fifth row of the root trie element 102 of FIG. 2A which have both a pointer and a prefix label must be modified. The optimization pushes the prefix label down to the vacant leaves 124, 125, 126, 127 of the trie element 104. Since these leaves have no pointer, there is room to store a prefix label. This is shown in FIG. 2B by pointer 205 pointing from trie element 102xe2x80x2 to trie element 104xe2x80x2, which has P2 pushed down into the previously empty slots of trie element 104xe2x80x2, and by pointer 208, which has P5 pushed down into the previously empty slots of trie element 108xe2x80x2. The occupied leaves 120, 121, 122, 123 do not receive a prefix because P6 is a longer match than P2 and therefore the shorter match can be ignored.
Conceptually the Lulea scheme starts with a leaf-pushed trie 101 and replaces, with a single value, all the consecutive entries in a trie element that have the same value. This can greatly reduce the amount of storage required in a trie element. To allow trie indexing to take place even after the elements have been compressed, a bit map with 1""s corresponding to retained positions and with 0""s corresponding to the removed positions is associated with each compressed element.
For example consider the partially compressed root element 102xe2x80x2 in FIG. 2B. After leaf pushing the root element 102xe2x80x2 has the sequence of values P3 (201), P3 (202), P1 (203), P1 (204), ptr1 (205), P4 (206), P2 (207), ptr2 (208) (where ptr1205 is a pointer to the trie element 104xe2x80x2 and ptr2208 is the pointer to the trie element 108xe2x80x2). Referring to FIG. 2C, this sequence of values is compressed by removing redundant values and indicating by the placement of 1""s in a bitmap the location of the remaining unique values. For example, original trie element 102xe2x80x2 is replaced with a bit map (10101111) 302 indicating the removed positions P3202 and P1204 by 0""s and a compressed list 304 comprising the prefix labels and pointers (P3 , P1 , ptr1, P4, P2, ptr2). The result of doing this for all four trie elements is shown in FIG. 2C.
The search of a trie element now consists of using a number of bits specified by the stride (e.g., 3 in this case) to index into each trie element starting with the root and continuing until a prefix label or a 0 is encountered. In the latter case, it is necessary to determine the appropriate action. For example, assume the data structure shown in FIG. 2C and a search for a target string that starts with 111111. Consider the first three bits (111) which index into the last position 208xe2x80x2 in the root element bitmap 302. Since position 208xe2x80x2 is the sixth bit set to 1 (counting from the left), index into the sixth entry of the compressed list 304 which is a pointer ptr2208xe2x80x3 to the right most trie element 108xe2x80x3. Here, use the next 3 bits of the target string (also 111) to index into the eighth bit or last position again. Since this bit is a 0 in bitmap 401, the search is terminated but it is still necessary to retrieve the best matching prefix. This is done by counting the number of bits set before the eighth or last position (4 bits) and then indexing into the 4th entry 403 in the compressed trie element 402 which gives the action associated with prefix label P5. In the present networking art, this action is likely to be the next hop address.
The Lulea scheme specifies a trie search that uses three strides of 16, 8 and 8. Without compression, the initial 16 bit array would require 64K entries of at least 16 bits each, and the remaining strides would require the use of large 256 entry trie elements. With the compression specified in the Lulea scheme, the MAE East prefix database requires only 200 Kbytes of storage. However, the use of leaf pushing makes insertion inherently slow in the Lulea scheme, since a single update can require the modification of many elements of the trie.
In the absence of deadlines and for small prefix databases, known software methods are sufficient to search for the longest prefix match. However, for many practical applications, such as packet classification in large dynamically-updated databases, there are deadlines or time limitations. Furthermore, the prefix database can contain many tens of thousands, or soon many hundreds of thousands, of entries which can slow a search. The resulting binary trie is exceedingly large and populated only sparsely with Prefix Nodes. What is needed is a method that facilitates a rapid search and minimizes the storage required.
According to the invention, in random access memory, a data structure of trie elements of compact and fixed size is provided in order to store elements of a hierarchical prefix structure such that the data structure can be searched quickly. A trie element according to the invention contains the data in one stride of the search through the prefix data structure. According to the invention, the trie element may contain 1) a description of the tree structure associated with the trie element, 2) a description of the links to the next level trie element and 3) a pointer to the storage location of the next level trie element. The prefix data structure has a first level trie element, and at least one second level trie element. The trie element includes a first code of the first level trie element describing the prefixes contained in the first level trie element, a second code specifying paths between the first level trie element and all children of the first level trie element (such children are second level trie elements), and a pointer for linking the first level trie element with one of the second level trie elements. Each of the first and second trie elements and the pointer are of a fixed, predefined size.
The present invention represents an advance over the best algorithm that is known, the Lulea algorithm. First, the present invention uses a completely different encoding scheme that relies on two codes per trie element, with each code corresponding to an independent bitmap. Second, only one memory reference per trie element is required, as opposed to two or three per trie element in Lulea. Third, fast update times are guaranteed since the number of operations is bounded by the depth of the trie. In the Lulea scheme, a single update can cause almost the entire database to be rewritten. Fourth, unlike Lulea, the present core algorithm can be instantiated to take advantage of the structure of known memories devices.
While the invention has been disclosed with specific reference to a telecommunication application, the data structure and search method of this invention promotes rapid searching of any prefix-type database for the longest prefix match. The method is well suited to implementation in computer hardware, provides a compact storage structure, is scalable to large databases, and can be applied to a pressing problem in the design of Internet multiservice routers.
There are a number of optimizations and ramifications suitable for adjusting the specific embodiments to various specialized needs. The invention will be better understood upon reference to the following detailed description in connection with the accompanying drawings.