The present invention relates to clustered computer systems with multiple nodes that provide services in a scalable manner. More specifically, the present invention relates to a method and an apparatus that uses forwarding lists of existing connections to forward packets to server nodes of a cluster.
The recent explosive growth of electronic commerce has led to a proliferation of web sites on the Internet selling products as diverse as toys, books, and automobiles, and providing services, such as insurance and stock trading. Millions of consumers are presently surfing through web sites in order to gather information, to make purchases, or to be entertained.
The increasing traffic on the Internet often places a tremendous load on the servers that host web sites. Some popular web sites receive over a million xe2x80x9chitsxe2x80x9d per day. In order to process this much traffic without subjecting web surfers to annoying delays in retrieving web pages, it is necessary to distribute the traffic between multiple server nodes, so that the multiple server nodes can operate in parallel to process the traffic.
In designing such a system to distribute traffic between multiple server nodes, a number of characteristics are desirable. It is desirable for such a system to be efficient in order to accommodate as much traffic as possible with a minimal amount of response time. It is desirable for such a system to be xe2x80x9cscalable,xe2x80x9d so that additional server nodes can be added and balancing between the nodes can be modifiable to provide a service as demand for the service increases. In doing so, it is important to ensure that response time does not increase as additional server nodes are added. It is also desirable for such a system to be constantly available, even when individual server nodes or communication pathways between server nodes fail.
A system that distributes traffic between multiple server nodes typically performs a number of tasks. Upon receiving a packet, the system performs a lookup to determine whether the service the packet is meant for is a scalable service.
Once the service is determined as a scalable service, the system distributes workload involved in providing the service between the server nodes that are able to provide the service. What is needed are a method and an apparatus for distributing workload between server nodes that is efficient, scalable, and highly available and allows client affinity.
Once a server node is selected, the packet is forwarded to the server node. The conventional technique of using a remote procedure call (RPC) or an interface definition language (IDL) call to forward a packet typically involves traversing an Internet Protocol (IP) stack from an RPC/IDL endpoint to a transport driver at the sender side, and then traversing another IP stack on the receiver side, from a transport driver to an RPC/IDL endpoint. Note that traversing these two IP stacks is highly inefficient. What is needed are a method and an apparatus for forwarding packets to server nodes that is efficient, scalable, and highly available.
It is desirable to have a scalable service that is transparent to an application. This transparency allows one to write an application that can run on a scalable service or a non-scalable service. Such an application is typically easier to write, since it does not need to take into account scalability. In addition, a scalable service that is transparent to the client application would tend to be able to use existing client applications. Scalable networks when running such applications may run the application on a node of the scalable service. If a series of connections are required between the server and client, one way of doing this is having the nodes in the scalable service have shared memory so that, if the client messages went to different nodes, any node on the system would be able to process the message by accessing the shared memory. The sharing of memory sometimes slows down the system and may be cumbersome. For these reasons, it would be desirable to have all of the packets from one client for one connection go to the same node in a scalable system (client affinity). If the distribution of work between the nodes changes, it would be desirable to have packets of an existing connection to continue to go to the same node until the connection is terminated.
It is desirable to provide the ability to send packets of an existing connection to the same node even when the workload is redistributed on a Solaris(trademark) operating system, which provides clustering and scalable service. Solaris(trademark) is manufactured by Sun Microsystems(trademark) of Palo Alto Calif.
One embodiment of the present invention provides a system that uses forwarding lists so that if the workload between nodes is redistributed, packets from an existing connection continue to be directed to the same server node until the connection is terminated.
Another embodiment of the present invention provides a method of distributing packets to server nodes in a cluster of nodes, comprising the steps of: receiving a packet at an interface node in the cluster of nodes, the packet including a source address; matching the packet with a service object; performing a function that maps the source address to a bucket of a plurality of buckets in a packet distribution table associated service object matched with the packet, the buckets containing identifiers for server nodes in the cluster of nodes; determining if the source address matches a listing in a forwarding list; if there is a match sending the packet to a node indicated by the match; and, if there is not a match sending the packet to a node identified by the bucket into which the source address of the packet is mapped.
These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.