IP scalability is somewhat limited by the address resolution mechanism. An optical-fiber link of 80 Gb/s capacity, for example, would required an address resolution mechanism capable of identifying some 40 million addresses per second, with a mean packet length of 2000 bits. This can be provided by brute-force duplication of an address-resolution mechanism. It can also be realized by elegant designs.
In addition to legacy routers, nodes which attempt to control the quality of service by creating connections for otherwise connectionless traffic must be able to translate the addresses of incoming packets, belonging to either connectionless or connection-based traffic. High-capacity nodes are desirable in order to reduce the number of hops between origin and destination.
The address translation considered in this disclosure may be stated as follows, with the help of FIG. 1 which illustrates graphically the translation problem 100 in generic terms:
An address space S 102 contains a large number of addresses of equal length, i.e., each address has the same number of bits. Only a small subset of the addresses is assigned to network users. The assigned addresses may appear in clusters 104. Each assigned address 106 is associated with a translation. To determine the translation of an arbitrary address, the address must first be located in the address space 102. Each assigned address 106 in the address space 102 maps to an address 108 in a condensed address space (an address table) 110. In other words, a set R of M addressable entities is stored in an address table (routing table) 110. The addresses of the address table are preferably stored in consecutive location in a memory. Each of the M entries has at least two fields, one containing an address and the other the corresponding translation. The translation may be the desired result or a pointer to a block of data containing the sought information about the address. The set R belongs to an address space S and each member in R belongs to the address space S. The address space S can be vast, having a very large number, N, of elements. Thus, while the set R can be stored in a memory device, the address space S cannot practically be stored. For example, in the Internet protocol IPv4, N is about four billions, while M is typically about sixty thousands. Thus the ratio N:M is of the order of 60,000.
As seen in the above discussion, there is an enormous address space, which is sparsely populated by a much-smaller number of assigned addresses. The addresses are of fixed length B; for example, in IPv4, an address has a length of four bytes (B=32 bits). Due to certain practical requirements imposed on the operation of the network, the division of the address space is done in such a way as to satisfy certain topological criteria and network layout considerations. Each address, of length B=32 bits for example, is segmented into two parts: J and B−J. The first segment J is often called a prefix. The prefixes have different lengths (different number of bits), and a prefix can be embedded in another prefix in a hierarchical fashion. The prefixes are unique and only the prefix portion of an address is meaningful to the network. The prefixes known to a network node are stored in an address table together with a translation for each prefix. If two or more addresses have the same prefix, they appear in the address table as a single entity. Thus, there are 2(B−J) potential addresses corresponding to a segment of J bits, many of which may be unused. With B=32 and J=20, for example, there are 212(that is, 4096) addresses of the same prefix. These (B−J) bits have only local significance to the receiving node. The purpose of the address-translation mechanism in a network node is to find the translation corresponding to any given prefix.
Thus, the translation problem is summarized as follows:
An address X in address space S is received from a source external to the translation device, and it is required to determine if it belongs to the set R, and if so to extract the corresponding translation and communicate it to the source or an agent of the source. If the elements in R and S are always of a same length, the translation can be a relatively simple task. An element in R may, however, be of any length between a minimum and maximum. In the IPv4 case, the address X has two segments, and the total length B is 32 bits. Only the first segment is of interest across the network. The translation device is aware of the total length of X but not the length of either segment. Furthermore, several entries in R may be identical to a leading portion, starting from the first bit, of the incoming address X. It is possible that two addresses with prefixes J and K, with K>J, have the same first J bits. In the IPv4 hierarchical structure, the entry in R with the highest coincident leading portion, of K bits say, is the sought address. If K=0, the address X does not belong to R.
FIG. 2 shows an address translation mechanism 120 which after having determined the translation (desired output port for example) corresponding to the requested address, routes a respective packet received at its input port to a desired destination. The address to be translated may be either a destination address or source address. A router receives packets at its input ports and routes them to proper output ports according to the addresses found in the packets' headers. Referring to FIG. 2, which depicts a mechanism for packet parsing, a packet arrives at an ingress port 122 and a parsing unit 124 separates its address from the packet. Both the address and the remainder of the packet are assigned a cyclical request number at block 126 from a pool of available numbers for managing the address translation process. The cyclical request numbers range from zero to a number that is sufficiently large to avoid depletion of available cyclical numbers. The assigned cyclical number is attached to both the address and the packet itself as indicated in blocks 128 and 130 of FIG. 2. The address with the request number is sent to address translation block 132 while the packet is stored at packet storage 134, generally in contiguous locations. In a forwarding process, the address translation block 132 determines the destination to which the stored packet should be transported based on the packet's address. By using the cyclical request number, the packet storage can be directly indexed and the packet can be associated with the translation of the address. At unit 140, the separated address is combined with the packet indexed in the packet storage 134 and the combined packet together with the translation result are returned to the port from which the requested packet has arrived or to any specified processing unit. The request number is returned to the pool of cyclical numbers for reuse when the translation is complete. The ingress port is now ready to forward to desired egress port the packet whose address has just been translated. It is also possible that both data packets and address be identified by ingress port number as well as translation request number.
FIG. 3 illustrates a known queuing mechanism 160 at a communications node which stores packets arriving at each ingress port in its ingress buffer 162 and indicates the location of each stored packet 166 in a pointer array 164. The length of array 164 is at least equal to the largest number in the set of cyclical translation request numbers. Array 164 is indexed by the assigned translation request number. When a packet assigned a translation request number X is queued in position Y in buffer 162, the number Y is written in the Xth entry in array 164. When the translation of the head-of-line packet in buffer 162 is complete, the Xth entry in array 164 is reset to null.
U.S. Pat. No. 5,414,704 May 9, 1995, Spinney, describes an address lookup technique in packet communications links of Ethernet, token ring or FDDI type. The technique uses a combination of programmable hash algorithms, binary search algorithms, and a small content-addressable memory (CAM). The CAM is used for finding a direct match of an address. For a search of other global addresses, hashing is used to produce local addresses of smaller widths and a balanced binary tree search finds a desired address from the hash table and the translation table.
In U.S. Pat. No. 5,425,028 Jun. 13, 1995, Britton et al, protocol selection and address resolution for programs running in heterogeneous networks are described. According to their invention, a program address is registered in the network so that it becomes available to other programs that understand the address, even if they are running over a transport protocol that does not understand the address format.
U.S. Pat. No. 5,812,767 Sep. 22, 1998, Desai et al, describe a system for user registering an address resolution routine to provide address resolution procedure which is used by data link provider interface for resolving address conflicts. An information handling system includes a number of stations connected in a network configuration, each station including a processor, a storage and an I/O controller. The processor operates under control of an operating system control program which is divided into a user (application) space and a kernel (system) space.
In U.S. Pat. No. 5,796,944 Aug. 18, 1998, Hill et al, an address management circuit and method of operation, for use in a communications internetworking device, includes a search engine having first and second search circuits for concurrently searching a network address table for source and destination addresses of a frame received by the communications internetworking device. Memory read cycles of the source and destination address searches are interleaved to allow a memory access to occur during every system cycle to thereby rapidly complete the searches for both the source and destination addresses.
A single indexed memory provides a simple means of address translation. In the Internet Protocol, the use of a single indexed memory is impractical. Even with IPv4, which uses a 32-bit address, the number of indexed-memory entries would be about 4-billions. It is possible, however, to exploit the fact that typically the prefix length is significantly smaller than 32 for a large proportion of the prefixes in the prefix directory. This property facilitates multi-stage indexing. A two-stage indexing approach was adopted by Gupta, Lin, and McKeown in their paper titled “Routing lookups in Hardware at Memory Access Speeds”, IEEE Infocom, 1998, pages 1240–1247. In their approach, a first memory is used for direct indexing using the first 24 bits of an IPv4 address. A second memory is used for prefixes whose lengths exceed 24. For each entry in the first memory that corresponds to a prefix greater than 24, an indexed extension array of 256 entries is used to translate the respective address. If an extension array does not have a valid entry in a range {X to 255}, with X<256, then the indexed array can be truncated to a length of X. This may save memory storage to some extent.
The technique of Gupta et al is in fact very efficient and economical to implement for IPv4 with its current prefix distribution. However, it suffers from a major shortcoming: it is heavily reliant on the assumption that the number of prefixes exceeding 24 is relatively small. With the growth of the Internet, and as new prefixes are assigned to new users, the distribution of the prefix length is likely to spread. This would render the technique impractical. For example, if only 10% of the entries in the first index stage extend to the second stage, then the second memory must have about 400 million entries, each entry including a translation. Furthermore, it is plausible that the address length, hence the prefix length, be extended to accommodate future demand. A 5-byte address, for example, would require excessive memory sizes. The memory size can be reduced to some extent by using several indexing stages, however, the memory requirement would still be prohibitive. Another factor to be taken into account is that the memory access speed decreases as the memory storage capacity increases.