1. Field of the Invention
The present invention relates to clustered computer systems with multiple nodes that provide services in a scalable manner. More specifically, the present invention relates to a method and an apparatus that performs fast packet forwarding between an interface node that receives a packet and a server node that provides a service associated with the packet.
2. Related Art
The recent explosive growth of electronic commerce has led to a proliferation of web sites on the Internet selling products as diverse as toys, books and automobiles, and providing services, such as insurance and stock trading. Millions of consumers are presently surfing through web sites in order to gather information, to make purchases, or purely for entertainment.
The increasing traffic on the Internet often places a tremendous load on the servers that host web sites. Some popular web sites receive over a million xe2x80x9chitsxe2x80x9d per day. In order to process this much traffic without subjecting web surfers to annoying delays in retrieving web pages, it is necessary to distribute the traffic between multiple server nodes, so that the multiple server nodes can operate in parallel to process the traffic.
In designing such a system to distribute traffic between multiple server nodes, a number of characteristics are desirable. It is desirable for such a system to be efficient in order to accommodate as much traffic as possible with a minimal amount of response time. It is desirable for such a system to be xe2x80x9cscalable,xe2x80x9d so that additional server nodes can be added an distribution to the nodes can be modifiable to provide a service as demand for the service increases. In doing so, it is important to ensure that response time does not increase as additional server nodes are added. It is also desirable for such a system to be constantly available, even when individual server nodes or communication pathways between server nodes fail.
A system that distributes traffic between multiple server nodes typically performs a number of tasks. Upon receiving a packet, the system looks up a service that the packet is directed to. (Note that a collection of server nodes will often host a number of different servers.) What is needed is a method and an apparatus for performing a service lookup that is efficient, scalable and highly available.
Once the service is determined, the system distributes workload involved in providing the service between the server nodes that are able to provide the service. For efficiency reasons it is important to ensure that packets originating from the same client are directed to the same server. What is needed is a method and an apparatus for distributing workload between server nodes that is efficient, scalable and highly available.
Once a server node is selected for the packet, the packet is forwarded to the server node. The conventional technique of using a remote procedure call (RPC) or an interface definition language (IDL) call to forward a packet typically involves traversing an Internet Protocol (IP) stack from an RPC/IDL endpoint to a transport driver at the sender side, and then traversing another IP stack on the receiver side, from a transport driver to an RPC/IDL endpoint. Note that traversing these two IP stacks is highly inefficient. What is needed is a method and an apparatus for forwarding packets to server nodes that is efficient, scalable and highly available.
One embodiment of the present invention provides a system for forwarding a packet between nodes in a clustered computing system. The system operates by receiving the packet at an interface node in the clustered computing system. This packet includes a source address specifying a location of a client that the packet originated from, and a destination address specifying a service provided by the clustered computing system. The system selects a server node in the clustered computing system to send the packet to from a plurality of server nodes that are able to provide the service. Next, the system forwards the packet to the server node so that the server node can provide the service to the client by, attaching a transport header to the packet, the transport header containing an address of the server node, and sending the packet to the server node through an interface. This interface is used for communications between the interface node and other nodes in the clustered computing system.
In one embodiment of the present invention, in forwarding the packet to the server node, the system load balances between multiple redundant paths between the interface node and the server node.
In one embodiment of the present invention, the packet includes an Internet Protocol (IP) header.
In one embodiment of the present invention, the system additionally receives the packet at the server node, strips the transport header from the packet, and places the packet on an IP stack at the server node.
In one embodiment of the present invention, the system ensures that an IP address of the service is hosted on a loopback interface of the server node so that the packet will be accepted by the server node.
In one embodiment of the present invention, the system allows the server node to send return communications directly to the client without forwarding the return communications through the interface node.
In one embodiment of the present invention, the system selects the server node based on the source address of the packet (and possibly the destination address in the packet).
In one embodiment of the present invention, the interface is a private interface, and is coupled to a communication channel adhering to either the Ethernet standard or the Scalable Coherent Interconnect (SCI) standard.
In one embodiment of the present invention, the transport header is a data link protocol interface (DLPI) header, which includes a medium access control (MAC) address of the server node.
In one embodiment of the present invention, the destination address includes an Internet Protocol (IP) address, an associated port number for the service and a protocol identifier (such as transmission control protocol (TCP) or user datagram protocol (UDP)).