This invention relates to search engines for searching large databases or tables of data, and particularly to fast flexible search engines employing longest prefix matching for lookup of prefixes and addresses.
Data communications is most often accomplished by sending a group of data in a packet, together with the address of the location for which the data are intended. Typically, a search for data is accomplished by conducting a binary search based on the address or prefix where the data are retained, or stored, and the data are returned in packets to the requesting address. The search is conducted by a lookup procedure that matches the query address with the stored address.
Lookup procedures are a major source of bottlenecks in high performance compilers and routers. Lookup techniques are used in various compilers such as compilers used in designing semiconductor and integrated circuit chips, and in networking applications such as Internet address (URL) lookup in high performance data routers where large search tables or databases are employed. Searching large tables or databases requires increased time or hardware requirements, or both, resulting in more expensive systems in terms of increased search delays and larger memories. Problems associated with such lookups increase with the size of the table or database, increases in address length, increases in traffic, and introduction of higher speed links. In Internet applications, this problem is expected to increase as the next generation IPv6 protocol (which uses 128-bit URL addresses) is introduced to supplement the current IPv4 (32-bit) protocol.
In our co-pending application Ser. No. 09/679,313, we describe a sorted binary search tree having a fixed number of levels, in which the bottom vertices or leaves contain keys and associated data. The hierarchy vertices or nodes contain one key for each child vertex (node or leaf) and a vertex address pointing to the vertex containing that key. The keys are arranged in a predetermined order, such as address order, across the vertices of each levels such that a search into a given hierarchy vertex (node) is directed into a specific group of keys. The search tree is structured so that all search paths are the same length. Nodes or vertices may be inserted and deleted at most levels, thereby maintaining the equal length to all search paths. The tree described in our aforementioned application also employs perfect matching techniques which seeks a perfect match between the input or query key and the key being sought. If a perfect match is not returned, the search reports a false return.
In some environments, longest prefix searching is used in conjunction with perfect matching to simplify the search procedure. Longest prefix searching involves finding the prefix of an address in the tree that contains the longest most significant string of bits matching a key of the input query. One example of a data structure using both longest prefix searching and perfect matching is Content Addressable Memory (CAM). However, CAM requires more memory than other data structures, and longest prefix searching requires external prefix sorting. External prefix sorting eliminates the dynamic editing features of the lookup table, decreases performance and increases external communication and control requirements.
There are difficulties with longest prefix searching. The prefixes may be any length, up to Wxe2x88x921, where W is the maximum length of an address. Since W=32 in IPv4 Protocol and W=128 in IPv6 Protocol, the prefix may be any length between 1 and 31 bits in IPv4 Protocol and between 1 and 127 bits in IPv6 Protocol. Thus, a vast number of variable-length prefixes exist in both IPv4 and IPv6 Protocols. The number of memory accesses required by traditional perfect match searching of variable-length prefixes increases with the number of prefixes. Consequently, the number of memory accesses can be large, thereby slowing the search process.
V. Srinivasan et al., in xe2x80x9cFast Address Lookups Using Controlled Prefix Expansionxe2x80x9d, ACM Transactions on Computer Systems, Vol 17, No. 1, February 1999, pp. 1-40, noted that worst case search delays are directly related to the number of distinct prefix lengths, and that faster searching could be accomplished by reducing the number of prefix lengths. Noting that current IPv4 Protocol results in 25 distinct prefix lengths between 8 and 32 bits, Srinivasan et al. proposed a system of controlled prefix expansion by selectively adding bits to shorter prefixes to thereby minimize the number of lengths of prefixes. In an example of seven prefix lengths, controlled expansion proposed by Srinivasan et al. resulted in a reduction to three prefix lengths. Principally, Srinivasan et al. added both a xe2x80x9c1xe2x80x9d and a xe2x80x9c0xe2x80x9d to a shorter prefix. If an existing prefix was duplicated by the expansion of a shorter prefix, only the non-conflicting expansion was employed. The result, however, was an expansion of the prefix table, with several expanded prefixes representing a single unexpanded prefix. Moreover, prefix insertion and deletion was complicated due to the need for non-duplication in the expansion.
The present invention is directed to an improvement of the search tree described in our aforementioned application, and particularly to a prefix search technique employing prefixes of a single length for faster searching.
In accordance with one aspect of the present invention, a process is provided for selecting a prefix in a collection that has a longest matching subprefix to a search prefix. A prefix search tree contains a plurality of vertices arranged in levels with each bottom vertex containing a plurality of binary prefixes and each hierarchy vertex containing a binary prefix from each child bottom vertices. A binary search prefix is input to the root vertex, and is compared to the prefixes in selected hierarchy vertices. A bit is set in a search mask based on a least significant bit of a bit string in the search prefix that matches a longest bit string in a prefix in each vertex. A longest matching subprefix is selected from a string of most significant bits of the search prefix based on the lowest significant bit set in the search mask.
One feature of the invention is that the longest matching subprefix is equal in bit length to the mask and to the prefixes in the tree. If the selected longest matching subprefix contains less than the prescribed number of bits, the subprefix is filled with empty bit positions.
In one embodiment of the invention, each prefix in the collection has an associated prefix mask that represents its common subprefixes in the collection. A special mask is constructed for each prefix in each vertex, the special mask being based on a comparison of the prefix in the vertex and the search prefix. The search mask is constructed based on a comparison of the special masks and the prefix masks. Thus, the search mask is based on the search prefix, the prefix in the vertex and the common subprefixes in the collection.
In accordance with the invention, if the longest matching subprefix matches a key in the collection, data associated with the key is output. If the longest matching subprefix does not exactly match a key in the collection, the longest matching subprefix is re-input to the search process to locate data associated with the longest matching subprefix in the collection.
One aspect of the invention resides in the arrangement of entries at each level in a predetermined order, such as in prefix value order. The search is conducted through the tree for the longest matching subprefix, and data associated with a prefix matching the longest matching subprefix are output.
Another aspect of the present invention resides in a process for creating the prefix mask for each prefix in each hierarchy and bottom vertex. Each prefix mask is based on the respective prefix and the vertex in which it is found.
The invention is carried out by a computer by embodying the invention in computer readable program code in a memory readable by the computer.