1. Technical Field
This invention relates to a data structure apparatus in the form of a binary digital tree and method of searching and modifying the data structure. More particularly, this invention relates to building a routing table that allows both specific and general entries, and incorporates a data structure in the form of a modified Practical Algorithm to Retrieve Information Coded in Alphanumeric Tree (xe2x80x9cPatricia Treexe2x80x9d) for building, searching and modifying the routing table with wildcard support. The invention further incorporates procedures for searching and modifying the wildcard routing table, wherein the procedures include filters and flags for focusing the scope of the search, and insert and delete procedures for modifying the table without affecting the integrity of any ongoing searches.
2. Description of the Prior Art
In recent years the world has come into the electronic era. A global network of interconnected computers now allows people from all over the world to communicate via electronic mail messages and to establish locations on the network for dissemination of information. As such, the ability to send and receive messages in a reasonable amount of time is becoming more cumbersome as the network continues to experience rapid growth.
Sending of electronic mail and exploring the global electronic network require proper routing of network messages to their intended destinations. Almost all transactions conducted on the global electronic network involve an exchange of messages between one computer and another. Every computer connected into the global computer network has at least one network address, and similar to a postal address for standard mail delivery, the network address is necessary to accommodate correct delivery of electronic messages through the network. When an application on an individual computer sends a network message, there are three possible scenarios that may occur: the message could be addressed to another application that runs on the same computer, the message could be addressed to a different computer that can be accessed directly, or the message could be addressed to a distant computer that requires the assistance of the global electronic network to be reached. In a conventional application, the first scenario is equivalent to handing a letter to a person who lives in the same house, the second scenario is equivalent to carrying a letter to a person who lives in the same building or neighborhood, and the third scenario is equivalent to sending a letter to the post office for delivery. Accordingly, for each of the scenarios the network must decide which case is applicable to the message and take the appropriate action.
Each computer recognizes its own address(es) and delivers internal messages immediately. A standard desktop computer has one network interface for direct connection to a local network. The standard network interface for a personal computer is in the form of an Ethernet card or a dial-up modem. In addition, a desktop computer may be a part of a local area network (xe2x80x9cLANxe2x80x9d) wherein messages addressed to any of the other computers on the LAN remain within the LAN. These messages are recognized as being targeted at addresses that are part of the LAN and are delivered within the LAN. However, for messages that are not being transmitted to addresses within a LAN, the computer requires locating an appropriate gateway for message delivery, wherein the gateway is a computer within the network that accepts messages for delivery to more distant locations. Use of the proper gateway for sending messages to distant computers is paramount for timely delivery of messages. The routing of messages involves effective selection of an outbound network interface. Frequently, the network traffic within the global electronic network favors the selection of gateways for routing of messages to the proper end destination. This may translate into the use of multiple gateways which act as delivery conduits which the messages pass through prior to reaching the intended destination. Accordingly, with the abundance of network addresses a simple table with entry of every address within the global electronic network is neither an effective nor efficient tool for managing delivery of messages over the global electronic network.
Conventional tables for storing data, such as words and numbers, contain only exact entries. Some tables only support searches for exact and complete values, while other tables support searches for inexact values as well. An inexact value may come in the form of a wildcard which can stand for any symbol or string of symbols. Routing tables generally contain inexact entries and support only searching with an exact target, wherein the search always begins with an exact and complete address. Accordingly, in using wildcard values in a routing table it is important to develop and/or utilize a data structure for efficiently building the tables.
Data structures in the form of trees are known as efficient tools for building routing tables and supporting searches beginning with a known prefix. A tree is a data structure accessed first at the root node. Each subsequent node can be either an internal node with further subsequent nodes or an external node with no further nodes existing under the node. An internal node refers to or has links to one or more descending or child nodes and is referred to as the parent of its child nodes, and external nodes are commonly referred to as leaves. The root node is usually depicted at the top of the tree structure and the external nodes are depicted at the bottom.
Tree structures are often defined by the characteristics of the tree. For example, a Binary Tree is a tree with at most two children for each node. A Digital Tree is a rooted tree where the leaves represent strings of digital symbols. The Patricia Tree is a Digital Tree with suppression of one way branching that prohibits keys which are strict prefixes of other branches. In general, a Patricia Tree is always a digital tree, but only a binary tree when the symbol alphabet is binary. The internal nodes represent a common prefix to a set of strings, and each child of that node corresponds to a choice of the next symbol to follow the common prefix. A Patricia Tree can take the form of both a Binary Tree and a Digital Tree where all internal nodes have at least two children.
As mentioned above, a Patricia Tries is an acronym for xe2x80x9cPractical Algorithm to Retrieve Information Coded in Alphanumericxe2x80x9d and is suitable for dealing with extremely long variable length keys such as titles or phrases stored within a large bulk file. The Patricia Tree adheres to two primary concepts. The first of these concepts is the concept of semi-infinite strings. These are strings with a particular starting position in a document which are then considered to continue indefinitely in the forward direction of the string. The second concept is that of being based on symbol-by-symbol comparison of data. In an algorithm developed for traversing such a tree, the decision on traversal direction is taken based on the value of the alphabetic symbol currently in consideration.
Within the Patricia Tree structure, internal nodes where there exists only one choice of the next symbol are omitted from the data structure. Patricia Trees keep track of the missing nodes by recording the distance from the beginning of the string at every node of the tree. The basic idea behind a Patricia Tree is to build a Digital Tree that avoids one-way branching by including in each node the number of symbols to skip over before making the next test. A Patricia Tree does not search for strict equality between key and argument, rather it will determine whether or not there exists a key beginning with the argument and proceed from there. More specifically, the Patricia Tree considers a single symbol at each internal node, and makes a comparison for string equality only at an external node. Accordingly, since traditional routing of electronic messages on a global computer network is based on sets of addresses with common. prefixes, Patricia Trees are a well known and widely used method for building network routing tables.
All Digital Trees, including Patricia Trees, are effective at finding prefixes of strings. However, such trees require special treatment to record a string which is also a prefix of other strings. In a Binary Tree there are only two symbols, 0 and 1, and they both appear at any point in a binary string. There are no symbols to reserve for an end marker to a string, and enlarging the alphabet to add one more symbol doubles the size of the strings in computer applications since two bits must be used for every symbol instead of one. There are other encoding techniques that are more efficient in space, but they radically transform the original binary data. Accordingly, it is desirable to use an internal symbol to identify the end of a data string.
There have been recent modifications to the applications of search trees for addressing the issue of Internet Protocol (xe2x80x9cIPxe2x80x9d) address lookup. The Lampson et al. document, xe2x80x9cIP Lookups Using Multiway and Multicolumn Search,xe2x80x9d shows how a binary search can be adapted for solving the best matching prefix problem. The basic binary search technique requires encoding a prefix as the start and end of a range, and precomputing the best-matching prefix associated with a range. The search includes a binary search on the number of possible prefixes as opposed to the number of prefix lengths. The data structure is encoded using both the start and end range of the data strings supported in the table, and effectively partitioning the single binary search table into multiple binary search tables for each value of the first x bits. Accordingly, Lampson et al. restructures the conventional binary tree data structure to allow multi way searching instead of binary searching
The Sklower document, xe2x80x9cA Tree-Based Packet Routing Table for Berkeley Unix,xe2x80x9d discloses assembling a collection of prototype addresses into a variant of a Patricia Tree, which is a binary radix tree with one way branching removed. The tree has internal nodes and external nodes, referred to as leaves, wherein the leaves represent address classes and contain information common to all possible destinations in each class. The leaves contain a prototype address and at least one mask, i.e. a pattern indicating which of the bits of the prototype address are relevant and which bits are wildcarded. The searching technique disclosed is a variant of a Patricia Tree with backtracking for general masks, when appropriate. However, Patricia Trees may only be efficient for supporting tables with wildcards wherein the wildcarded bits are isolated at the end of the prototype address. Accordingly, what is desirable is a modification to the Patricia Tree to efficiently support wildcard asks within the prototype address.
Doeringer et al., Waldvogel et al, Degermark et al., Nilsson et al., and Srinivasan et al. each disclose techniques for building and searching the routing table. Each of the techniques focus on the problem of Internet routing and are therefore limited to searching for address ranges with a common prefix. Readings of the routing tables are efficient, however updating the tables generally require building an entire new table and then replacing the existing table with the new table. Accordingly, since large server computers with rapidly changing sets of connected clients must update routing tables frequently, the data structures disclosed by Doeringer et al., Waldvogel et al., Degermark et al., Nilsson et al., and Srinivasan et al. are not appropriate for these large computers.
Accordingly, what is desirable is a data structure that allows both specific and general data entries and selects the most specific data for matching purposes. Such a data structure must be efficiently consulted for every network message, while allowing the contents of the data structure to change at a slower pace. The data structure must be especially efficient on large, shared-memory multiprocessor computers and should not be too strictly specialized for network routing problems so that it can be applied to other searching and matching techniques. In addition, the data structure must support concurrent reading among multiprocessors as well as support updating of the data structure while reading of the data structure is taking place. Accordingly, an efficient data structure is desirable for use on multiprocessor computers in conjunction with a read-copy update procedure which supports reading in conjunction with table updating without delay or interference from changes to the structure contents.
It is therefore an object of the invention to provide a digital tree in the form of a modified Patricia Tree for combining a prototype address and a mask into a ternary data string. It is a further object of the invention to provide a method for searching the modified Patricia Tree of the invention. It is a further object of the invention to provide a method for modifying the modified Patricia Tree of the invention. It is even a further object of the invention to provide a method for removing nodes from the data structure. Other objects of the invention include providing a computer system and article of manufacture for use with the search tree of the invention.
The invention resides in a search tree data structure which can be used to classify data in a computer system. The search tree has multiple internal nodes, and each internal node includes at least four pointer fields. At least two of the pointer fields correspond to specific alphabetic values, which are preferably (but not necessarily) bit values. A third, xe2x80x9cwildcardxe2x80x9d pointer field corresponds to all of the alphabetic values. A fourth, xe2x80x9cepsilonxe2x80x9d pointer field corresponds to the data string ending at a specific length. Each internal node includes pointers in at least two of the four pointer fields, which guarantees that the search tree provides at least two way branching at each internal node.
The invention also resides in a method for classifying data using the data structure summarized above. A preferred searching method incorporates filters and flags for focusing the parameters and for enabling searching data strings of incomplete values. A preferred insertion method ensures that each node within the data structure has at least two way branching from a previous node.