The present invention relates generally to the field of transmitting data packets on a wide-area data communication network such as the Internet. More particularly, the present invention relates to routing data packets by using the destination address field of a data packet to retrieve an output port address from a trie data structure.
In the data communication network known as the Internet, a device known as a router routes data packets. The router may be connected to (1) a local area network composed of host computers and/or (2) a plurality of other routers. Each data packet originates at a specific computer and may include voice, music, full-motion video or data information. The data packet is then transmitted from one router to another by action of the circuitry of the routers. Eventually, the data packet is transmitted to a router that is directly connected to the computer that is the intended recipient of the data packet.
A central problem in designing a router is maximizing the efficiency of the router in determining the proper output port address for the router to use in transmitting the presented data packet. This process should occur using a minimum of clock cycles, otherwise the presented data packet would have to be stored while awaiting processing. (This storage or buffering would increase the hardware cost of the router.)
In the simplest form of routing, the circuitry of the router would have the capability to determine the appropriate output port address for every possible destination address field of a presented data packet. Although such an approach was viable in the early days of the Internet, it is no longer a useful option given the present size of the Internet. As an alternative, several different techniques have been used such as binary trees, b-trees and hash tables. Most of these techniques do not attempt to associate an output port address with the entire destination address field but with one or more of the most significant bits of the destination address. (The binary number represented by one or more of the most significant bits of the destination address field is commonly referred to in the prior art as a prefix.) More specifically, these techniques use the longest prefix for which there is an associated output port address as the basis for selecting the output port address for transmitting the presented data packet.
As an example of the functioning of one of these techniques, assume that a data packet contains a destination address field that includes the four bit number xe2x80x981011xe2x80x99 as its most significant bits. Four of the possible prefixes of the destination address field of this data packet are xe2x80x981xe2x80x99, xe2x80x9810xe2x80x99, xe2x80x98101xe2x80x99 or xe2x80x981011xe2x80x99. Further assume that the circuitry of the router has access to a database that contains each output port address associated with each of these prefixes. Although there are several output port addresses associated with this single destination address field, the circuitry of the router transmits the presented data packet in accordance with the output port address that is associated with the prefix that is the longest of the various prefixes. Accordingly, the presented data packet would be transmitted to the router associated with the output port address that is associated with the prefix xe2x80x981011xe2x80x99. If the data base did not contain an output port address associated with the prefix xe2x80x981011xe2x80x99, then the output port address associated with xe2x80x98101xe2x80x99 would be used. If the data base did not contain an output port address associated with the prefix xe2x80x98101xe2x80x99, then the process continues in similar fashion using the shorter prefixes. If there were no output port addresses associated with any prefix of the destination address field of the presented data packet, then the router may use a default output port address.
The reason that the longest prefix is selected is because the longest prefix contains the most information about the ultimate destination of the data packet, and accordingly the output port address associated with the longest prefix is the best choice in light of the information available.
The goal then in designing algorithms in the field to which the present invention pertains is to develop an algorithm that most efficiently (1) determines the longest prefix from the destination address field for which there is an associated output port address and (2) retrieves that associated port address for use by the circuitry of the router in transmitting the presented data packet.
One common algorithm in the prior art that is designed to satisfy the two above goals uses the data structure commonly known as a xe2x80x9ctriexe2x80x9d data structure. The trie data structure is a type of tree data structure. The trie data structure has a root node, intermediate level nodes and external leaf nodes. Each node consists a predetermined plurality of words. Each word is composed of a predetermined plurality of bits. The plurality of bits in each word is divided into fields.
In one embodiment in the prior art, each word contains the following fields:
The stop flag field, which indicates to the circuitry of the router if retrieval stops at the node that holds the word or should continue to a higher level node whose address is contained in the forward pointer field, which is discussed below.
The valid flag field, which indicates to the circuitry of the router whether the output port address contained in this word is valid.
The output port address field, which contains, if appropriate, the output port address of the next router to which the presented data packet would be transmitted, in appropriate circumstances by the action of the circuitry of the router.
The children counter field, which indicates to the circuitry of the router the number of words in the node that contain useful information. In this context, useful information is either a word with a valid output port address (that is, the valid flag field is set to xe2x80x981xe2x80x99) or a word with a valid forward pointer field. In the latter case, the stop flag field would be set to xe2x80x980xe2x80x99. The children counter field is contained only in the first word of the relevant node.
The forward pointer field, which contains, if appropriate, the address of the next node of the trie data structure in the memory of the circuitry of the router if the stop bit is not xe2x80x981xe2x80x99.
This trie data structure supports the actions of the circuitry of a router in (1) retrieving an output port address from the trie data structure, (2) inserting an output port address into the trie data structure and (3) deleting an output port address from the trie data structure.
In accordance with a common technology in the prior art, the destination address field of the presented data packet is parsed by the action of the circuitry of the router into several smaller binary numbers of an equal predetermined length. (For purposes of this specification and claims, each of the smaller binary numbers is referred to as a substring.) For example, if the predetermined length was 4 bits, then a 32 bit destination address would be broken into 8 substrings each of 4 bits.
The number of words in each node is equal to 2 raised to the predetermined number of bits in each substring, and each substring is an address to a word in the appropriate node (This number is referred to in this specification and claims as the xe2x80x9cdimensionxe2x80x9d of the node.) The circuitry of the router first processes the word in the root node whose address is the initial substring. If the valid flag field of that word is xe2x80x981xe2x80x99, then the binary number stored in the output port address field of that word is stored in an internal register, so that at any point the last valid output port address is available to the circuitry of the router. If the stop flag field of that word is also xe2x80x981xe2x80x99, then the output port address used in transmitting the presented data packet is the output port address stored in the output port address field of that word. If the stop flag field is xe2x80x980xe2x80x99, then a word of the next higher-level node is processed by the circuitry of the router. The address of that node is stored in the forward pointer field of that word. The circuitry of the router then processes the words whose address is the next higher-level node is the second substring.
In summary, the process continues until a word is processed whose stop flag field is xe2x80x981xe2x80x99. (For purposes of this specification and claims, the node that contains this word is referred to as the terminal node.) At this point, retrieval ceases. If the valid flag field is xe2x80x981xe2x80x99, the circuitry of the router transmits the data packet to the router associated with the output port address field stored in the output port address field of the relevant word of the terminal node. If the valid flag field is xe2x80x980xe2x80x99, the last valid output port address, stored in an internal register, will be used.
It is important to note that each prefix of the destination address field of a presented data packet maps to a single output port address; the reverse, however, is not the case: one output port address may be mapped to by many prefixes.
By way of an example, FIG. 1 shows a representation of a simple trie data structure. The trie data structure of FIG. 1 has three levels. The first level consist of root node S0. The second level is an intermediate level that consists of node S1. The third level consists of leaf node S2.
Assume that a data packet with a destination address whose most significant bits are xe2x80x98001110000000xe2x80x99 is presented to a router using the trie data structure of FIG. 1. The circuitry of the router will first parse these bits into three 4-bit substrings: xe2x80x980011xe2x80x99, xe2x80x981000xe2x80x99 and xe2x80x980000xe2x80x99. The circuitry of the router first processes the word in the root node S0 whose address is xe2x80x980011xe2x80x99. Since the valid flag field of this word is xe2x80x981xe2x80x99, the output port addressxe2x80x94xe2x80x9cexe2x80x9dxe2x80x94is stored in an internal register. (For purposes of this specification xe2x80x9caxe2x80x9d, xe2x80x9cbxe2x80x9d and similar single character strings enclosed in double quotes represent a specific output port address.) Since the stop bit field of the word being processed in node S0 is xe2x80x980xe2x80x99, the circuitry of the router then proceeds to process the next node, S1 of the trie data structure. The circuitry of the router processes the word in node S1 whose address is xe2x80x981000xe2x80x99. The circuitry of the router does not store output port address that is contained in the output port address field of this word in an internal register because the valid field flag of this word is xe2x80x980xe2x80x99. Since the stop play field of this word is xe2x80x980xe2x80x99, the circuitry of the router next processes the word in node S2 whose address is xe2x80x980000xe2x80x99. Since the stop flag field is of this word is xe2x80x981xe2x80x99 and valid flag field is xe2x80x981xe2x80x99, the circuitry of the router transmits the data packet presented to the router to the router whose output port address is xe2x80x9cp.xe2x80x9d
As another example, assume that a data packet with a destination address field whose most significant bits are xe2x80x98001110000001xe2x80x99 is presented to the router. The retrieval will be identical to the one described in the previous paragraph except for the last substring. Instead of the circuitry of the router processing the word in node S2 at address xe2x80x980000xe2x80x99, the circuitry of the router processes the word at the address xe2x80x980001xe2x80x99. Since the stop flag field of this word is xe2x80x981xe2x80x99 the retrieval stops. Since the valid flag is xe2x80x9c0xe2x80x9d, the data packet is transmitted to the router associated with output port address xe2x80x9cexe2x80x9d, which was the last valid output port address stored in an internal register.
Insertion of an additional prefixes in the trie data structure has two cases. In the first case, no new nodes are inserted in the trie data structure. In the second case, a new node must be added to the trie data structure. Referring again to FIG. 1, if output port address xe2x80x9cbxe2x80x9d associated with the prefix xe2x80x980101xe2x80x99 is to be inserted, then the circuitry of the router alters the valid flag field of the word with address xe2x80x980101xe2x80x99 to xe2x80x980xe2x80x99 and inserts xe2x80x9cbxe2x80x9d in the output port address field of this word. The circuitry of the router then increments the children counter field of node S0 at address xe2x80x980000xe2x80x99 from 2 to 3. The circuitry of the router does not alter the stop flag field because it is already xe2x80x981xe2x80x99.
If prefix xe2x80x981001010000100000xe2x80x99 associated with output port address xe2x80x9cdxe2x80x9d is inserted in the trie data structure represented in FIG. 1, then the sequence of steps will be more involved since a new node must be inserted in the trie data structure. The circuitry of the router first processes the word located at address xe2x80x981001xe2x80x99 in node S0. The stop flag field of this word is altered from xe2x80x981xe2x80x99 to xe2x80x980xe2x80x99. As a consequence the children counter field of the root node S0, changes from xe2x80x982xe2x80x99 to xe2x80x983xe2x80x99. Similar actions take place with respect to node S1 and S2. The circuitry of the router then appends node S3 to the trie data structure. The children counter field of node S3 is incremented from 0 to 1. The circuitry of the router causes the value of valid flag field of the word at address xe2x80x980000xe2x80x99 in node S3 to change to xe2x80x981xe2x80x99 and causes xe2x80x9cdxe2x80x9d to be inserted in routing address field of the relevant word. No change is necessary to the stop flag field because for purposes of these examples, it is assumed that in all newly inserted nodes, all stop flag fields are initially set at xe2x80x981xe2x80x99 and all valid field flags are set at xe2x80x980xe2x80x99.
The trie data structure also lends itself well to efficient deletion of a prefix. If a prefix is deleted from the search trie, the node containing this prefix may or may not be removed. For example, removing routing address xe2x80x9caxe2x80x9d associated with prefix xe2x80x9800110100xe2x80x99 will be accomplished by making valid flag field of node S1 at address xe2x80x980100xe2x80x99 equal to xe2x80x980xe2x80x99 and decrementing children counter field of node S1 to 1. The node S1 is not deleted because it contains other valid entries.
In order for the action of the circuitry of the router to perform deletion correctly, a deletion stack will need to be utilized. Basically all the addresses, which the search will traverse will be pushed on this stack. After the search, the address will be popped up one by one, and the nodes containing them will be processed. Deletion stops when the circuitry of the router encounters a first node with a children counter field that holds a value greater than zero.
A serious drawback of the prior art is its inability to efficiently handle prefixes that are not an integral multiple of the number of words in each node of the trie data structure. (For purposes of this specification and claims a prefix of this nature is referred to as a xe2x80x9csubprefixxe2x80x9d.) For example, if the dimension of each node is four, the prior art does not provide an obvious method for insertion or deletion of a prefix not an integral multiple of four. One approach has been to expand the subprefix and convert it into a prefix by concatenating the subprefix with the appropriate number of bits as illustrated in FIG. 3. For example, if subprefix xe2x80x981011010xe2x80x99 was to be inserted into the trie data structure, then the output port address would have to be associated with the prefixes xe2x80x9810110100xe2x80x99 and xe2x80x9810110101xe2x80x99 in order that the output port address is properly inserted in the trie data structure. The insertion, in itself, does not present a problem. Assume however, that output port address associated with the prefix xe2x80x9810110100xe2x80x99 is inserted in the trie data structure. In that case, the output port address associated with subprefix xe2x80x981011010xe2x80x99 would be overridden (as appropriate), but if the output port address associated with xe2x80x9810110100xe2x80x99 was then later deleted, the output port address information associated with the subprefix xe2x80x981011010xe2x80x99 is partially lost.
This loss of information occurs as a general matter whenever a prefix is inserted in the trie data, over writing a subprefix, and the prefix is then later deleted from the trie data structure.
One aspect of the invention is a trie data structure stored on a computer readable medium that is used for storing a first binary number associated with a second binary number, the trie data structure consisting of nodes, the nodes including a root node, the root node being a first node, at least one intermediate level node, the intermediate level node being directly linked to the root node and having other lower level nodes directly linked to the intermediate level node, and at least one leaf node that is directly linked to a higher level node and having no lower level nodes directly linked to that leaf node, each of the nodes being an associative table of binary numbers consisting of rows and columns with each number of the rows of each of these nodes being the same number, wherein, if the length of the second binary number is not an integral multiple of the number of rows, the first binary number is stored at a position defined by the intersection of a row and a column in one of the nodes, wherein that one node is determined by the second binary number and the row and column of the one node is determined by the second binary number.
Another aspect of the invention is a method for use in retrieving a first binary number associated with a second binary number in a trie data structure stored on a computer readable storage medium, the trie data structure consisting of nodes, the nodes including a root node, the root node being a first node, at least one intermediate level node, the intermediate level node being directly linked to the root node and having other lower level nodes directly linked to the intermediate level node, and at least one leaf node, the leaf node being directly linked to a higher level node and having no lower level nodes directly linked to the leaf node, each of the nodes being an associative table of binary numbers consisting of rows and columns with each number of the rows of each of the nodes being the same number, wherein, when the length of the second binary number is not an integral multiple of the number of rows, the first binary number is stored at a position defined by the intersection of at least one of the rows and at least one of the columns in at least one of the nodes, the method including the steps of:
i. parsing the second binary number into substrings, wherein each of the substrings has a number of binary digits equal to the dimension of each of the nodes except for the last substring whose number of binary digits is less than the dimension of each of the nodes;
ii. using the substrings to map a path from the root node to a terminal node holding the first binary number; and
iii. retrieving the first binary number from the terminal node.
Another aspect of the invention is a method for use in inserting a first binary number associated with a second binary number in a trie data structure stored on a computer readable storage medium, the trie data structure consisting of nodes, the nodes including a root node, the root node being a first node, at least one intermediate level node, the intermediate level node being directly linked to the root node and having other lower level nodes directly linked to the intermediate level node, and at least one leaf node, the leaf node being directly linked to a higher level node and having no lower level nodes directly linked to the leaf node, each of the nodes being an associative table of binary numbers consisting of rows and columns with each number of the rows of each of the nodes being the same number, wherein, when the length of the second binary number is not an integral multiple of the number of rows, the first binary number is stored at a position defined by the intersection of at least one of the rows and at least one of the columns in a terminal node, the method including the steps of:
i. parsing the second binary number into substrings, wherein each of the substrings has a number of binary digits equal to the dimension of each of the nodes except for the last substring whose number of binary digits is less than the dimension of each of the nodes;
ii. using the substrings to map a path from the root node to the terminal node associated with the first binary number; and
iii. storing the first binary number from the terminal node.
Another aspect of the invention is a method for use in deleting a first binary number associated with a second binary number in a trie data structure stored on a computer readable storage medium, the trie data structure consisting of nodes, the nodes including a root node, the root node being a first node, at least one intermediate level node, the intermediate level node being directly linked to the root node and having other lower level nodes directly linked to the intermediate level node, and at least one leaf node, the leaf node being directly linked to a higher level node and having no lower level nodes directly linked to the leaf node, each of the nodes being an associative table of binary numbers consisting of rows and columns with each number of the rows of each of the nodes being the same number, wherein, if the length of the second binary number is not an integral multiple of the number of rows, the first binary number is stored at a position defined by the intersection of a row and a column in a terminal node, the method including the steps of:
i. parsing the second binary number into substrings, wherein each of the substrings has a number of binary digits equal to the dimension of each of the nodes except for the last substring whose number of binary digits is less than the dimension of each of the nodes;
ii. using the substrings to map a path from the root node to the terminal node associated with the first binary number; and
iii. deleting the first binary number from the terminal node.
Another aspect of the invention is in a trie data structure stored on a computer readable storage medium that is used for transmitting a presented data packet, the trie data structure including a root node, one or more intermediate level nodes, and one or more leaf nodes, wherein each node is composed of a number of words, each of the words having fields composed of a plurality of bits that is processed by the circuitry of a router and causes the router to transmit the presented data packet in accordance with an output port address associated with the word, a field composed of a plurality of bits that is examined by the circuitry and indicates to the circuitry if a valid output port address is contained in the word, a field composed of a plurality of bits that is capable of containing an output port address, a field composed of a plurality of bits that is capable of containing the address in the computer readable storage medium of another node of the trie data structure, a field composed of a plurality of bits that is capable of containing information indicating either the number of words in the node that contain useful information, wherein the improvement includes:
i. a first field composed of a plurality of binary digits that is capable of containing information indicating either the number of words in the node that contain useful information or additional routing information associated with a subprefix;
ii. a second field composed of a plurality of binary digits is capable of indicating that an output port address is associated with a full prefix; and
iii. a third field composed of a plurality of binary digits that is capable of containing information indicating the location of valid output port addresses in the first field.