The present invention relates to clustered computer systems with multiple nodes that provide services in a scalable manner. More specifically, the present invention relates to a method and devices adapted to support client affinity in cluster computer systems.
The recent explosive growth of electronic commerce has led to a proliferation of web sites on the Internet selling products as diverse as toys, books, and automobiles, and providing services, such as insurance and stock trading. Millions of consumers are presently surfing through web sites in order to gather information, to make purchases, or to be entertained.
The increasing traffic on the Internet often places a tremendous load on the servers that host web sites. Some popular web sites receive over a million xe2x80x9chitsxe2x80x9d per day. In order to process this much traffic without subjecting web surfers to annoying delays in retrieving web pages, it is necessary to distribute the traffic between multiple server nodes, so that the multiple server nodes can operate in parallel to process the traffic.
In designing such a system to distribute traffic between multiple server nodes, a number of characteristics are desirable. It is desirable for such a system to be efficient in order to accommodate as much traffic as possible with a minimal amount of response time. It is desirable for such a system to be xe2x80x9cscalable,xe2x80x9d so that additional server nodes can be added and balancing between the nodes can be modifiable to provide a service as demand for the service increases. In doing so, it is important to ensure that response time does not increase as additional server nodes are added. It is also desirable for such a system to be constantly available, even when individual server nodes or communication pathways between server nodes fail. It is desirable to provide a flexible system with different levels of client affinity.
A system that distributes traffic between multiple server nodes typically performs a number of tasks. Upon receiving a packet, the system performs a lookup to determine whether the service the packet is meant for is a scalable service.
Once the service is determined as a scalable service, the system distributes workload involved in providing the service between the server nodes that are able to provide the service. What is needed are a method and an apparatus for distributing workload between server nodes that is efficient, scalable and highly available, and allows client affinity.
Once a server node is selected, the packet is forwarded to the server node. The conventional technique of using a remote procedure call (RPC) or an interface definition language (IDL) call to forward a packet typically involves traversing an Internet Protocol (IP) stack from an RPC/IDL endpoint to a transport driver at the sender side, and then traversing another IP stack on the receiver side, from a transport driver to an RPC/IDL endpoint. Note that traversing these two IP stacks is highly inefficient. What is needed are a method and an apparatus for forwarding packets to server nodes that is efficient, scalable, and highly available.
It is desirable to have a scalable service that is transparent to an application. This transparency allows one to write an application that can run on a scalable service or a non-scalable service. Such an application is typically easier to write, since it does not need to take into account scalability. In addition, a scalable service that is transparent to the client application would tend to be able to use existing client applications. Scalable networks when running such applications may run the application on a node of the scalable service. If a series of connections are required between the server and client, one way of doing this is having the nodes in the scalable service have shared memory so that, if the client messages went to different nodes, any node on the system would be able to process the message by accessing the shared memory. The sharing of memory sometimes slows down the system and may be cumbersome. For these reasons, it would be desirable to have all of the packets from one client for one connection go to the same node in a scalable system (client affinity). It is also desirable to provide client affinity that is transparent to the client application.
Certain applications that use HTTP (HyperText Transfer Protocol) sometimes require a client affinity with the server. The HTTP protocol is basically stateless. However, certain higher level applications that use HTTP (a good example being servlets) maintain state between HTTP sessions (where an HTTP session would be a TCP connection). A normal HTTP page may require one or more connections and, as a general rule, the server would not keep any state when simply reading news or data. However, in the case of e-commerce, state could be maintained between TCP connections using client affinity. An example of this would be shopping at Amazon.com(trademark), where a connection lets a user browse books and choose items to buy, while another connection is used to maintain the user""s shopping basket through a mechanism of server session identifiers or cookies. In certain cases, multiple connections from the same client maintaining some affinity with a particular server would be helpful. In scalable services, the traffic coming to a particular shared IP address is distributed to any node of the cluster that is capable of satisfying the request. This distribution of the load is done on a per packet basis in the case of UDP and on a per connection basis in the case of TCP. Scalable services do not (and under normal circumstances should not) care as to what the application does with these requests. Such behavior could impair client affinity. Examples of services that are expected to have problems with a lack of client affinity are HTTP, secure HTTP, Java Servlets, FTP, passive mode FTP, and Real Audio(copyright). It is desirable to provide different types of client affinity to scalable services, such as no affinity, client affinity, and wild card client affinity. It is desirable to provide different types of client affinity on a Solaris(trademark) operating system, which provides clustering and scalable service.
Solaris(trademark) is manufactured by Sun Microsystems(trademark) of Palo Alto Calif.
One embodiment of the present invention provides a system that uses a packet distribution table to distribute packets to server nodes in a cluster of nodes that operate in concert to provide at least one service. The system operates by receiving a packet at an interface node in the cluster of nodes. This packet includes a source address specifying a location of a client from which the packet originated and a destination address specifying a service provided by the cluster of nodes. The system performs a function that maps the source address to an entry in a packet distribution table and retrieves an identifier specifying a server node from the entry in the packet distribution table. Next, the system forwards the packet to the server node specified by the identifier so that the server node can perform a service for the client. In this way, packets directed to a service specified by a single destination address are distributed across multiple server nodes in a manner specified by the packet distribution table. The invention provides different types of client affinity, so that the operator may define which services have the different types of client affinity.
Another embodiment of the present invention provides a method of distributing packets to server nodes in a cluster of nodes, comprising the steps of receiving a packet that is directed to a selected service supported by the cluster wherein the selected service can be provided by a plurality of nodes in the cluster. determining an appropriate server node based at least in part on whether the service designates client affinity, and passing the received packet to the appropriate server node.
These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.