The present invention relates generally to data communication networks and more particularly relates to a method of searching utilizing longest match based Radix Search Trie method using variable length keys and having the ability of keys to prefix each other.
Currently, there is a growing trend to make Asynchronous Transfer Mode (ATM) networking technology the base of future global communications. ATM has already been adopted as a standard for broadband communications by the International Telecommunications Union (ITU) and by the ATM Forum, a networking industry consortium.
ATM originated as a telecommunication concept defined by the Comite Consulatif International Telegraphique et Telephonique (CCITT), now known as the ITU, and the American National Standards Institute (ANSI) for carrying user traffic on any User to Network Interface (UNI) and to facilitate multimedia networking between high speed devices at multi-megabit data rates. ATM is a method for transferring network traffic, including voice, video and data, at high speed. Using this connection oriented switched networking technology centered around a switch, a great number of virtual connections can be supported by multiple applications through the same physical connection. The switching technology enables bandwidth to be dedicated for each application, overcoming the problems that exist in a shared media networking technology, like Ethernet, Token Ring and Fiber Distributed Data Interface (FDDI). ATM allows different types of physical layer technology to share the same higher layerxe2x80x94the ATM layer.
More information on ATM networks can be found in the book xe2x80x9cATM: The New Paradigm for Internet, Intranet and Residential Broadband Services and Applications,xe2x80x9d Timothy Kwok, Prentice Hall, 1998.
ATM uses very short, fixed length packets called cells. The first five bytes, called the header, of each cell contain the information necessary to deliver the cell to its destination. The cell header also provides the network with the ability to implement congestion control and traffic management mechanisms. The fixed length cells offer smaller and more predictable switching delays as cell switching is less complex than variable length packet switching and can be accomplished in hardware for many cells in parallel. The cell format also allows for multi-protocol transmissions. Since ATM is protocol transparent, the various protocols can be transported at the same time. With ATM, phone, fax, video, data and other information can be transported simultaneously.
ATM is a connection oriented transport service. To access the ATM network, a station requests a virtual circuit between itself and other end stations, using the signaling protocol to the ATM switch. ATM provides the User Network Interface (UNI) which is typically used to interconnect an ATM user with an ATM switch that is managed as part of the same network.
The current standard solution for routing in a private ATM network is described in Private Network Node Interface (PNNI) Phase 0 and Phase 1 specifications published by ATM Forum. The previous Phase 0 draft specification is referred to as Interim Inter-Switch Signaling Protocol (IISP). The goal of the PNNI specifications is to provide customers of ATM network equipment some level of multi-vendor interoperability.
As part of the ongoing enhancement to the ATM standard by work within the ATM Forum and other groups, the Private Network to Network Interface (PNNI) protocol Phase 1 has been developed for use between private ATM switches and between groups of private ATM switches. The PNNI specification includes two categories of protocols. The first protocol is defined for the distribution of topology information between switches and clusters of switches where the information is used to compute routing paths within the network. The main feature of the PNNI hierarchy mechanism is its ability to automatically configure itself within the networks in which the address structure reflects the topology. The PNNI topology and routing techniques are based on the well known link state routing technique.
The second protocol is effective for signaling, i.e., the message flows used to establish point-to-point and point-to-multipoint connections across the ATM network. This protocol is based on the ATM Forum User to Network Interface (UNI) signaling with mechanisms added to support source routing, crankback and alternate routing of source SETUP requests in the case of bad connections.
With reference to the PNNI Phase 1 specifications, the PNNI hierarchy begins at the lowest level wherein the lowest level nodes are organized into peer groups. A logical node in the context of the lowest hierarchy level is the lowest level node. A logical node is typically denoted simply as a node. A peer group is a collection of logical nodes wherein each node within the group exchanges information with the other members of the group such that all members maintain an identical view of the group. When a logical node becomes operational, the nodes attached to it initiate and exchange information via a well known Virtual Channel Connection (VCC) used as a PNNI Routing Control Channel (RCC).
Hello messages are sent periodically by each node on this link. In this fashion, the Hello protocol makes the two neighboring nodes known to each other. Each node exchanges Hello packets with its immediate neighbors to determine its neighbor""s local state information. The state information includes the identity and peer group membership of the node""s immediate neighbors and a status of its links to its neighbors. Each node then bundles its state information in one or more PNNI Topology State Elements (PTSEs) which are subsequently flooded throughout the peer group.
PTSEs are the smallest collection of PNNI routing information that is flooded as a unit among all logical nodes within a peer group. A node topology database consists of a collection of all PTSEs received, which represent that particular node""s present view of the PNNI routing topology. In particular, the topology database provides all the information required to compute a route from the given source node to any destination address reachable in or through that routing domain.
When neighboring nodes at either end of a logical length begin initializing through the exchange of Hellos, they may conclude that they are in the same peer group. If it is concluded that they are in the same peer group, they proceed to synchronize their topology databases. Database synchronization includes the exchange of information between neighboring nodes resulting in the two nodes having identical topology databases. A topology database includes detailed topology information about the peer group in which the logical node resides in addition to more abstract topology information representing the remainder of the PNNI routing domain.
During a topology database synchronization, the nodes in question first exchange PTSE header information, i.e., they advertise the presence of PTSEs in their respective topology databases. When a node receives PTSE header information that advertises a more recent PTSE version than the one that it has already or advertises a PTSE that it does not yet have, it requests the advertised PTSE and updates its topology database with the subsequently received PTSE. If the newly initialized node connects to a peer group then the ensuing database synchronization reduces to a one way topology database copy. A link is advertised by a PTSE transmission only after the database synchronization between the respective neighboring nodes has successfully completed. In this fashion, the link state parameters are distributed to all topology databases in the peer group.
Flooding is the mechanism used for advertising links whereby PTSEs are reliably propagated node by node throughout a peer group. Flooding ensures that all nodes in a peer group maintain identical topology databases. A short description of the flooding procedure follows. PTSEs are encapsulated within PNNI Topology State Packets (PTSPs) for transmission. When a PTSP is received its component PTSEs are examined. Each PTSE is acknowledged by encapsulating information from its PTSE header within the acknowledgment packet which is sent back to the sending neighbor. If the PTSE is new or of more recent origin then the node""s current copy, the PTSE is installed in the topology database and flooded to all neighboring nodes except the one from which the PTSE was received. A PTSE sent to a neighbor is periodically retransmitted until acknowledged.
Note that flooding is an ongoing activity wherein each node issues PTSPs with PTSEs that contain updated information. The PTSEs contain the topology databases and are subject to aging and get removed after a predefined duration if they are not refreshed by a new incoming PTSE. Only the node that originally originated a particular PTSE can re-originate that PTSE. PTSEs are reissued both periodically and on an event driven basis.
As described previously, when a node first learns about the existence of a neighboring peer node which resides in the same peer group, it initiates the database exchange process in order to synchronize its topology database with that of its neighbor""s. The database exchange process involves exchanging a sequence of database summary packets which contain the identifying information of all PTSEs in a node topology database. The database summary packet performs an exchange utilizing a lock step mechanism whereby one side sends a database summary packet and the other side responds with its own database summary packet, thus acknowledging the received packet.
When a node receives a database summary packet from its neighboring peer, it first examines its topology database for the presence of each PTSE described within the packet. If the particular PTSE is not found in its topology database or if the neighboring peer has a more recent version of the PTSE then the node requests the PTSE from the particular neighboring peer or optionally from another neighboring peer whose database summary indicates that it has the most recent version of the PTSE.
A corresponding neighboring peer data structure is maintained by the nodes located on either side of the link. The neighboring peer data structure includes information required to maintain database synchronization and flooding to neighboring peers.
It is assumed that both nodes on either side of the link begin in the Neighboring Peer Down state. This is the initial state of the neighboring peer for this particular state machine. This state indicates that there are no active links through the neighboring peer. In this state, there are no adjacencies associated with the neighboring peer either. When the link reaches the point in the Hello protocol where both nodes are able to communicate with each other, the event AddPort is triggered in the corresponding neighboring peer state machine. Similarly when a link falls out of communication with both nodes the event DropPort is triggered in the corresponding neighboring peering state machine. The database exchange process commences with the event AddPort which is thus triggered but only after the first link between the two neighboring peers is up. When the DropPort event for the last link between the neighboring peers occurs, the neighboring peer state machine will internally generate the DropPort last event closing all state information for the neighboring peers to be cleared.
It is while in the Negotiating state that the first step is taken in creating an adjacency between two neighboring peer nodes. During this step it is decided which node is the master, which is the slave and it is also in this state that an initial Database Summary (DS) sequence number is decided upon. Once the negotiation has been completed, the Exchanging state is entered. In this state the node describes is topology database to the neighboring peer by sending database summary packets to it.
After the peer processes the database summary packets, the missing or updated PTSEs can then be requested. In the Exchanging state the database summary packets contain summaries of the topology state information contained in the node""s database. In the case of logical group nodes, those portions of the topology database that where originated or received at the level of the logical group node or at higher levels are included in the database summary. The PTSP and PTSE header information of each such PTSE is listed in one of the node""s database packets. PTSE""s for which new instances are received after the exchanging status has been entered may not be included in any database summary packet since they will be handled by the normal flooding procedures.
The incoming data base summary packet on the receive side is associated with a neighboring peer via the interface over which it was received. Each database summary packet has a database summary sequence number that is implicitly acknowledged. For each PTSE listed, the node looks up the PTSE in its database to see whether it also has an instance of that particular PTSE. If it does not or if the database copy is less recent, then the node either re-originates the newer instance of the PTSE or flushes the PTSE from the routing domain after installing it in the topology database with a remaining lifetime set accordingly.
Alternatively, if the listed PTSE has expired, the PTSP and PTSE header contents in the PTSE summary are accepted as a newer or updated PTSE with empty contents. If the PTSE is not found in the node""s topology database, the particular PTSE is put on the PTSE request list so it can be requested from a neighboring peer via one or more PTSE request packets.
If the PTSE request list from a node is empty, the database synchronization is considered complete and the node moves to the Full state.
However, if the PTSE request list is not empty then the Loading state is entered once the node""s last database summary packet has been sent but the PTSE request list is not empty. At this point, the node now knows which PTSE needs to be requested. The PTSE request list contains a list of those PTSEs that need to be obtained in order to synchronize that particular node""s topology database with the neighboring peer""s topology database. To request these PTSEs, the node sends the PTSE request packet which contains one or more entries from the PTSE request list. The PTSE request list packets are only sent during the Exchanging state and the Loading state. The node can sent a PTSE request pack to a neighboring peer and optionally to any other neighboring peers that are also in either the Exchanging state or the Loading state and whose database summary indicate that they have the missing PTSEs.
The received PTSE request packets specify a list of PTSEs that the neighboring peer wishes to receive. For each PTSE specified in the PTSE request packet, its instance is looked up in the node""s topology database. The requested PTSEs are subsequently bundled into PTSPs and transmitted to the neighboring peer. Once the last PTSE and the PTSE request list has been received, the node moves from the Loading state to the Full state. Once the Full state has been reached, the node has received all PTSEs known to be available from its neighboring peer and links to the neighboring peer can now be advertised within PTSEs.
A major feature of the PNNI specification is the routing algorithm used to determine a path for a call from a source user to a destination user. The routing algorithm of PNNI is a type of link state routing algorithm whereby each node is responsible for meeting its neighbors and learning their identities. Nodes learn about each other via the flooding of PTSEs described hereinabove. Each node computes routes to each destination user using the information received via the PTSEs to form a topology database representing a view of the network.
Using the Hello protocol and related FSM of PNNI, neighboring nodes learn about each other by transmitting a special Hello message over the link. This is done on a continual periodic basis. When a node generates a new PTSE, the PTSE is flooded to the other nodes within its peer group. This permits each node to maintain an up to date view of the network. Additional information on link state routing can be found in Section 9.2 of the book Interconnections: Bridges and Routers by Radia Perlman, Addison-Wesley, 1992, incorporated herein by reference.
Once the topology of the network is learned by all the nodes in the network, routes can be calculated from source to destination users. A routing algorithm commonly used to determine the optimum route from a source node to a destination node is known as the Dijkstra algorithm. The Dijkstra algorithm is used to generate the Designated Transit List which is the routing list used by each node in the path during the setup phase of the call. Used in the algorithm are the topology database (link state database) which includes the PTSEs received from each node, a Path List comprising a list of nodes for which the best path from the source node has been found and a Tentative List comprising a list of nodes that are possible best paths. Once it is determined that a path is in fact the best possible, the node is moved from the Tentative List to the Path List.
The algorithm begins with the source node (self) as the root of a tree by placing the source node ID onto the Path List. Next, for each node N placed in the Path List, N""s nearest neighbors are examined. For each neighbor M, add the cost of the path from the root to N to the cost of the link from N to M. If M is not already in the Path List or the Tentative List with a better path cost, add M to the Tentative List.
If the Tentative List is empty, terminate the algorithm. Otherwise, find the entry in the Tentative List with the minimum cost. Move that entry to the Path List and repeat the examination step described above.
More detailed information on the Dijkstra algorithm can be found in Section 9.2.4 of the book Interconnections: Bridges and Routers by Radia Perlman, Addison-Wesley, 1992, incorporated herein by reference.
In PNNI routing, one of the first steps performed in the route calculation process is to find a full match in the topology database on the destination address. Note that in PNNI, a full match search is always performed as opposed to a partial match search, regardless of the length of the address. This means that the longest of all the addresses in the database that exactly matches the destination address is to be searched for. A more detailed description of full matching versus partial matching can be found in U.S. Pat. No. 5,940,396 entitled xe2x80x9cMETHOD OF ROUTING IN AN ASYNCHRONOUS TRANSFER MODE NETWORK,xe2x80x9d similarly assigned and incorporated herein by reference in its entirety.
In performing the search of the topology database, a simple sequential search will work but it is very inefficient in both time and computing resources. Other, more efficient search techniques are known some of which are described hereinbelow.
A well known radix search method is the digital tree search which is based on the binary tree search. The difference being that the decision to branch is based on the bits of the key rather than on the results of a comparison between the keys. At the first level the leading bit is used, followed by the second leading bit at the second level, and so on until an external node is found.
In cases where the search keys are relatively long as in network applications, the cost of comparing a search key for equality with a key from the tree can be a major cost factor. Digital tree searching uses such a comparison at each node of the tree.
Radix search trees do not store keys in the tree nodes but rather store the keys in external nodes of the tree. Thus, there are two types of nodes: (1) nodes which contain links to other nodes and (2) nodes which comprise keys and no links. This search method is known as xe2x80x98triexe2x80x99 for its usefulness for retrieval operations. A search for a key in such a tree comprises branches are taken in accordance with the key""s bits. No compare operation is performed until a key on the tree is reached. Each key in the tree is stored on the path described by the leading bit pattern of the key and each search key winds up at a key, thus requiring only one full key comparison to complete the search.
The radix search method described above has two disadvantages: (1) one way branching leads to the creation of extra nodes in the tree and (2) the tree comprises two different types of nodes. The Practical Algorithm To Retrieve Information Coded In Alphanumeric search algorithm, better known as the Patricia search trie method. The Patricia method permits searching for N arbitrarily long keys in a tree with only N nodes and only requires one full key comparison per search. One way branching is avoided by each node containing the index of the bit to be tested to decide which path to take out of that particular node. External nodes are avoided by replacing links to external nodes (keys) with links that point upwards in the tree. The keys in the tree are stored in the nodes for reference when the bottom of the tree is reached.
The search in such a tree begins at the root and proceeds down the tree, using the bit index in each node to determine which bit to examine in the search key. If the bit is xe2x80x980xe2x80x99 the left direction is taken and if the bit is xe2x80x981xe2x80x99 the right direction is taken. The keys in the node are not examined at all in the way down the tree. Eventually, an upwards link is encountered whereby each upward link points to the unique key in the tree that has the bits that would cause a search to that link. If the key at the node pointed to by the first upward link encountered is equal to the search key, then the search is successful, otherwise it is not. For tries, i.e., retrieving, all searches terminate at external nodes, whereupon one full key comparison is performed to determine whether or not the search is successful.
More detailed descriptions of the binary, radix and Patricia search tree search algorithms can be found in Chapter 17 of Algorithms by Robert Sedgewick, Addison-Wesley, 1988, incorporated herein by reference.
A major disadvantage, however, of the conventional Patricia search trie algorithm is that no key can be a prefix of another key. Patricia search trees can handle variable length keys with the limitation that if one key is the prefix of another, the algorithm operates such that one of the keys will be dropped. The rule in this case is that there can only be one key that matches. This search algorithm is suitable when all the keys are unique and no one key is a prefix of another key. In a Patricia search, there is no such thing as a best match because the key is either in the tree or it is not. Recall that a full compare is only performed once, on the last leaf.
A diagram illustrating a portion of an example prior art Patricia search trie tree having a root and a single node is shown in FIG. 1. The node 12 comprises an index of 2 and a left pointer 16 to address C and a right pointer 18 to address B. The network comprises four nodes with the listed addresses A through D. Having knowledge of the network topology, each node attempts to insert the node prefixes into a Patricia tree. It is assumed that addresses B and C have already been inserted into the tree. The node then attempts to insert address A into the tree. Since address A differs in the second bit, a right branch is taken. A problem occurs because address B is a prefix of address A. Both addresses A and B cannot coexist in the tree simultaneously. One of the two addresses will be dropped.
Thus, the conventional search algorithms discussed above are not suitable for use in networks such as ATM. This is because the addressing structure of ATM networks based on PNNI routing are based on permitting addresses to be prefixes of other addresses. In PNNI based ATM networks, PNNI reachable addresses are intentionally selected to be prefixes of one another due to the address summarization feature of PNNI which permits network designers to construct large hierarchical networks. Large numbers of nodes are grouped into peer groups which are assigned logical addresses. Peer groups are formed in a hierarchical fashion with the addresses assigned to higher level peer groups being prefixes of the addresses assigned to lower level peer groups.
Note that the keys, which are comprised of a string of bits, as used in the prior art search algorithms are replaced with network addresses when applied to PNNI based routing in an ATM network.
One prior art solution is provided in xe2x80x9cRouting on Longest-Matching Prefixes,xe2x80x9d W. Doeringer, G. Karjoth and M. Nassehi, IEEE Transactions on Networking, Vol. 4, No. 1, February 1996. The method describes a compact digital trie termed dynamic prefix tries with the ability to insert, delete and retrieve binary keys in a dynamic database. The binary keys can have arbitrary length and may be prefixes of one other which makes this solution suitable for use with routing algorithms.
A diagram illustrating the node structure for a prior art compact digital trie as described in the Doeringer et al. reference is shown in FIG. 2. The node structure 20 comprises an index 22, left key 24, right key 26, parent pointer 28, left subtrie pointer 30 and right subtrie pointer 32.
A major drawback to this approach is that it is much more complicated when compared to the Patricia search trie algorithm and the other conventional search trie algorithms. The algorithm makes heavy additions to the conventional Patricia algorithm in terms of complexity and computing resources needed. For example, it changes the structure of the basic tree, significantly modifies the process of entering keys and searching for a match and adds complexity by maintaining parent pointers and the ability to traverse the tree bi-directionally.
Further, additional memory resources are required for this algorithm over the Patricia algorithm. In particular, additional memory space is required for the node structure which comprises pointers to left and right keys, left and right nodes for subtree branches and a parent pointer.
The present invention is a method of searching utilizing a longest match based Radix Search Trie with variable length keys and having the ability to handle keys being prefixes of other keys. The method of the present invention is based on the well known Patricia search trie algorithm. Although the conventional Radix Search Trie tree method is used, the address prefixes representing the keys for the tree are modified before being inserted into the algorithm. An extra byte is added to the beginning of each address prefix. The byte that is added is equal to the length of the address prefix. The combined address length byte followed by the address prefix is then used as the key for performing the Patricia trie tree algorithm. Adding a byte holding the length to the address serves to make the address prefix unique. Thus, when one address is the prefix of another, the length byte will make both addresses unique and distinct from each other. The conventional Patricia search can now be used since the keys have been made unique.
Addresses of variable length are handled by creating and maintaining a prefix list comprising an entry for each distinct value of address length for all the nodes in the topology database. The entries are stored in a circularly linked list sorted in descending numerical order of the length field. This insures that the address being the longest match will be found when searching for a destination address.
There is provided in accordance with the present invention a method of searching utilizing a conventional Patricia search tree constructed from one or more nodes for storing one or more keys, the method comprising the steps of inserting a key into the tree: modifying the keys to be inserted into the tree so as to make each key unique with respect to other keys that may be prefixes thereof, inserting the modified key into the tree utilizing a conventional Patricia search tree algorithm, providing a list which includes entries for each different key length represented in the tree, each entry in the list including a length field and a count field, updating the list so as to maintain the entries in descending numerical order of the length field; searching the tree for a search key: determining the largest key length in the list, concatenating the key length onto the search key to form a modified search key, searching the tree with the modified search key utilizing a conventional Patricia search algorithm and determining the next largest key length in the list and repeating the steps of concatenating and searching until either the modified search key is found or the entries in the list are exhausted.
The step of modifying a key comprises the step of adding a number represented by a fixed length to the beginning of a key, the number having a value equal to the length of the key. The step of modifying a key comprises the step of adding a byte of data to the beginning of a key, the byte of data having a value equal to the length of the key. Also, the keys may be of variable length and may be prefixes of other keys in the tree. In addition, the keys may comprise address prefixes.
The byte of data may comprise a Private Network Node Interface (PNNI) level value and the list may comprise a circularly linked list. The method further comprises the step of incrementing the count field of an entry in the list when a key is added to the tree, the entry corresponding to the length of the key and the step of decrementing count field of an entry in the list when a key is removed from the tree, the entry corresponding to the length of the key.
Also, the method further comprises the step of removing an entry from the list when a key is removed from the tree and the key was the last key with that particular length and further comprises the step of storing a plurality of identical keys in the tree. The step of storing may comprise the step of storing the plurality of identical keys in a circularly linked list, the circularly linked list stored in the tree in similar fashion to that of other keys.
In addition, the of determining the largest key length in the list may comprise the step of maintaining a longest length variable indicating the entry in the list with the longest length. The step of determining the next largest length in the list may comprise the step of traversing the list to the next entry pointed to by the current entry.