1. Field of the Invention
The present invention relates to distributed computing, and deals more particularly with a method, system, and computer program product for quickly and efficiently looking up IP (Internet Protocol) addresses, for example in routing tables used by network routers.
2. Description of the Related Art
Business and consumer use of distributed computing, also commonly referred to as network computing, has gained tremendous popularity in recent years. In this computing model, the data and/or programs to be used to perform a particular computing task typically reside on (i.e. are “distributed” among) more than one computer, where these multiple computers are connected by a network of some type. The Internet, and the part of the Internet known as the World Wide Web (hereinafter, “Web”), are well-known examples of this type of environment wherein the multiple computers are connected using a public network. Other types of network environments in which distributed computing may be used include intranets, which are typically private networks accessible to a restricted set of users (such as employees of a corporation), and extranets (e.g., a corporate network which is accessible to other users than just the employees of the company which owns and/or manages the network, such as the company's business partners).
The Internet Protocol (IP) is used to enable the interconnected networks of the Internet to communicate with each other. In the most commonly used version of IP, which is known as “IPv4”, the network address of each sender and receiver of information is specified as a 4-byte (32-bit) number. The leftmost bits of an IP address uniquely identify the network in which a particular device is located, and the rightmost bits uniquely identify the device within that network. The bits comprising the network identification are commonly referred to as the network number or network prefix of an IP address, while the rightmost are commonly referred to as the local address or host address.
Originally, IP addresses were divided into 4 classes, referred to as Class A, Class B, Class C, and Class D. In this addressing scheme, the bit settings in the leftmost 4 bits of the 32-bit address identify the class for a particular IP address. Each different class uses a different boundary for distinguishing which part of a 32-bit address is considered to be the network prefix, and which part is considered to be the local address. A network address from a Class A network uses 7 bits for the network prefix and 24 bits for the local address. A Class B network uses 14 bits for the network prefix and 16 bits for the local address. A Class C network uses 21 bits for the network prefix and 8 bits for the local address, and a Class D network uses 28 bits as a multicast address. The 32-bit addresses are commonly expressed using what is known as “dotted quad” notation, where each 8-bit byte of an address is converted to a decimal representation and the 4 decimal numbers are then written as a string separated by periods. Thus, the dotted quad representation of a Class A network address specifies the network number as the first decimal number, followed by 3 decimal numbers identifying the local address; the dotted quad representation of a Class B network address, on the other hand, can be interpreted as “network.network.local.local.”.
A 3-level addressing scheme may alternatively be represented within the 32 bits, where the additional level is used to group local addresses into subnetwork, or subnet, addresses. The combination of the network prefix with the subnet address is referred to as an “extended network prefix”. A subnet mask, which is a 32-bit number specified as a series of contiguous 1-bits on the left and 0-bits on the right, is used to indicate where the extended network prefix ends and the local address begins.
The gateways and routers which are responsible for routing data packets through a distributed network must store information about the path to use in order to route the packets destined for a particular address. This information is stored in routing tables (sometimes referred to as a “routing cache”). Use of subnet addressing enables reducing the number of entries in a routing table, because routers and gateways which are external to an organization's private network need only contain a routing table entry for the organization's network address: the organization's internal routers then handle the routing among subnetworks using the subnet address specified in a data packet.
Several problems were encountered with the 4-class addressing scheme as distributed computing gained in popularity. It appeared as though the range of available address numbers would soon be exhausted, and many of the addresses which had already been assigned were not being used, due to the inflexible boundaries dictated by this addressing scheme.
A technique known as Classless Inter-Domain Routing (CIDR) was developed to address these problems. In CIDR, a network mask value is used to determine the boundary between network addresses (including the subnet address) and the local address, without regard to any notion of class structures. CIDR requires that routers and gateways use a consistent, longest-match algorithm for forwarding data packets. In this algorithm, that part of the destination address in a data packet which is identified as being the network address (using the network mask) is compared to entries in a routing table (where the table entries are generally also specified in terms of a network mask) to located the path to be used for forwarding the packet. If more than one routing table entry matches the destination address, then the more specific entry (which is the entry having the longest network prefix) must be used.
Even though CIDR provides interim relief for address assignment, 32 bits is deemed to be insufficient for supporting the future growth of distributed computing. Thus, a newer addressing scheme known as “IPv6” has been defined which uses 128-bit addresses. This addressing scheme has not yet been widely implemented.
Routers and gateways (referred to hereinafter as routers, for ease of reference) must be able to quickly evaluate the IP address in a data packet in order to determine how to route the packet while providing an acceptable level of performance and throughput. As link speeds are increasing, the number of IP packets which a router is required to process per second is becoming very high. One critical factor in the router's performance and throughput is the route lookup technique used with the routing tables.
Most implementations of routing tables today use radix trees. Radix trees require a significant amount of programming logic, and expenditure of a significant amount of computing time in traversing the trees to find a particular route. Furthermore, radix trees cannot exploit a multi-processor (MP) approach wherein the computing task is shared among processors. Another existing technique is use of stored linked lists. Linked lists have well-known performance problems, and are also not MP-exploitable. Some existing implementations use hash tables along with sorted linked lists. This approach provides performance which is significantly better than linked lists alone, but still does not provide an optimal (nor an MP-exploitable) solution. A technique designated “DIR-24-8-BASIC” was proposed by Pankaj Gupta et al. at Infocomm 1998, where two separate routing tables are used: one table for routes which are less than 25 bits long, and a different table for routes which are 25 bits or longer. This technique, however, assumes that most routes have prefixes of 24 bits or less and is therefore thought by the inventor of the present invention to have a rather narrow focus. (A copy of this conference paper was published on the Internet at a web page of Stanford University.) A technique known as “Multi-Protocol Label Switching” (MPLS) has also been proposed, where this technique would replace the longest-prefix match approach with a simple direct lookup. This approach, however, would require adoption of new protocol standards, and thus is not earily nor quickly adaptable into the established distributing computing infrastructure.
In addition to the need to inspect IP addresses for routing purposes, there are a number of new servers or gateways which provide a specific service based upon the IP address of the client. Examples include firewalls, IPSec (IP Security) gateways, and Dynamic Host Configuration Protocol (DHCP) servers. (For example, DHCP server, which manage the assignment of IP addresses to hosts, generally function differently depending on whether the host client or a server.) The performance of these servers and gateways will be constrained by the time it takes to inspect the IP address and determine which service to provide
Accordingly, what is needed is an improved technique for evaluating or interpreting IP address values.