1. Field of the Invention
The present invention relates to clustered computer systems with multiple nodes that provide services in a scalable manner. More specifically, the present invention relates to a method and an apparatus that uses a destination address to perform a fast lookup to determine a service for a packet.
2. Related Art
The recent explosive growth of electronic commerce has led to a proliferation of web sites on the Internet selling products as diverse as toys, books and automobiles, and providing services, such as insurance and stock trading. Millions of consumers are presently surfing through web sites in order to gather information, to make purchases, or purely for entertainment.
The increasing traffic on the Internet often places a tremendous load on the servers that host web sites. Some popular web sites receive over a million xe2x80x9chitsxe2x80x9d per day. In order to process this much traffic without subjecting web surfers to annoying delays in retrieving web pages, it is necessary to distribute the traffic between multiple server nodes, so that the multiple server nodes can operate in parallel to process the traffic.
In designing such a system to distribute traffic between multiple server nodes, a number of characteristics are desirable. It is desirable for such a system to be efficient in order to accommodate as much traffic as possible with a minimal amount of response time. It is desirable for such a system to be xe2x80x9cscalable,xe2x80x9d so that additional server nodes can be added an distribution to the nodes can be modifiable to provide a service as demand for the service increases. In doing so, it is important to ensure that response time does not increase as additional server nodes are added. It is also desirable for such a system to be constantly available, even when individual server nodes or communication pathways between server nodes fail.
A system that distributes traffic between multiple server nodes typically performs a number of tasks. Upon receiving a packet, the system looks up a service that the packet is directed to. (Note that a collection of server nodes will often host a number of different servers.) What is needed is a method and an apparatus for performing a service lookup that is efficient, scalable and highly available.
Once the service is determined, the system distributes workload involved in providing the service between the server nodes that are able to provide the service. For efficiency reasons it is important to ensure that packets originating from the same client are directed to the same server. What is needed is a method and an apparatus for distributing workload between server nodes that is efficient, scalable and highly available.
Once a server node is selected for the packet, the packet is forwarded to the server node. The conventional technique of using a remote procedure call (RPC) or an interface definition language (IDL) call to forward a packet typically involves traversing an Internet Protocol (IP) stack from an RPC/IDL endpoint to a transport driver at the sender side, and then traversing another IP stack on the receiver side, from a transport driver to an RPC/IDL endpoint. Note that traversing these two IP stacks is highly inefficient. What is needed is a method and an apparatus for forwarding packets to server nodes that is efficient, scalable and highly available.
One embodiment of the present invention provides a system that uses a destination address of a packet to perform a fast lookup to determine a service that is specified by the destination address. The system initially receives a packet at an interface node in the cluster of nodes. This packet includes a source address specifying a location of a client that the packet originated from, and the destination address specifying a service provided by the cluster of nodes. The system uses the destination address to perform a first lookup into a first lookup structure containing identifiers for scalable services. Note that a scalable service is a service that provides more server node capacity for the scalable service as demand for the scalable service increases. If no identifier for a scalable service is returned during the first lookup, the system sends the packet to a server node in the cluster of nodes that provides a non-scalable service.
In one embodiment of the present invention, if an identifier for a scalable service is returned for the packet, the system looks up a server node to send the packet to, based upon the source address of the packet (and possibly the destination address of the packet) and sends the packet to the server node.
In one embodiment of the present invention, the system looks up the server node by performing a function that maps the source address to an entry in a packet distribution table (PDT), which includes entries containing identifiers for server nodes. In a variation on this embodiment, the function is a hash function that maps different source addresses to different entries in the packet distribution table in a substantially random manner, so that a given source address always maps to the same entry in the packet distribution table.
In one embodiment of the present invention, the system allows the server node to send return communications directly to the client without forwarding the return communications through the interface node.
In one embodiment of the present invention, the first lookup structure is a hash table containing the identifiers for the scalable services.
In one embodiment of the present invention, if the first lookup does not return an identifier for a scalable service, the system uses the destination address to perform a second lookup into a second lookup structure containing identifiers for scalable services. In a variation on this embodiment, the first lookup is based upon an Internet Protocol (IP) address and an associated port number, and the second lookup is based upon the IP address without the associated port number.
In one embodiment of the present invention, the first lookup structure includes identifiers for scalable services that use a first load balancing policy to distribute packets between server nodes, and the second lookup structure includes identifiers for scalable services that use a second load balancing policy. In a variation on this embodiment, the second load balancing policy locates related services for a given client on the same server node.
In one embodiment of the present invention, if no scalable service is returned for the packet, the system allows a server instance on the interface node to provide the service.
In one embodiment of the present invention, the system periodically sends checkpointing information from a PDT server node to a secondary PDT server node so that the secondary PDT server node is kept in a consistent state with the PDT server node. This allows the secondary PDT server node to take over for the PDT server node if the PDT server node fails.
In one embodiment of the present invention, the system periodically sends checkpointing information from a master PDT server node to at least one slave PDT server node so that the slave PDT servers are kept in a consistent state with the master PDT server.
In one embodiment of the present invention, the destination address includes an Internet Protocol (IP) address, an associated port number for the service and a protocol identifier (such as transmission control protocol (TCP) or user datagram protocol (UDP)).