High availability system (HA system) denotes a server system that is always on. Always-on functionality can be acquired by utilizing not only a single server, but a group of servers generally termed a server cluster. The always-on functionality requires that a system is fault-tolerant, redundant, and scalable.
Relating to a server cluster, fault-tolerance means that if one of the servers crashes, the server cluster can still serve its clients. The clients are applications sending requests and communicating with the server cluster. A server may crash, for example, due to a software or hardware failure. Then the other servers of the fault-tolerant server cluster continue to handle the requests which had been initially addressed to the server that crashed.
A server cluster is termed redundant when it includes several servers so that some of them can be in a waiting state during low load. The number of requests correlates with the load of a server cluster. When there are a lot of requests, the server cluster load is usually high. During high load all or most of the servers are in busy state handling the requests.
A server cluster is termed scalable when its architecture is such that one or more servers can be added to the server cluster. The capacity requirements of a server cluster may increase in time, thus it is important that new servers can be added to the server cluster without tedious blackouts.
Load balancing means that the servers of a server cluster can share load with each other. For example, Compaq Tru Cluster© (Compaq Computers, Houston Tex., USA), Sun Full-cluster© (Sun Microsystems, Palo Alto, Calif., USA), and HP-Service guard© (HP, Palo Alto, Calif., USA) are examples of server clusters that include hardware load balancing. Stonebeat© product of Stonesoft Corporation (Helsinki, Finland) achieves the load balancing with software.
Web servers, gateways, and accelerating servers are examples of equipment which are often implemented by a server cluster. A web server operates as the node of the Internet and a gateway transmits data between two networks. The third example may be less known. An accelerating server accelerates network traffic to achieve better network utilization or performance. Sometimes acceleration actions do not increase the number of transmitted bits but they still improve user-experience. For example, reducing the size of transmitted data packets is an acceleration action intended for improving user-experience.
A typical server cluster uses at least one shared disk. The shared disk is needed for the fault-tolerance of the server cluster. If a first server belonging to the server cluster crashes, a second server can read the first server's data from the shared disk and continue the first server's tasks. Commonly, the shared disk is duplicated to ensure the operation of the server cluster if one of the shared disks crashes.
The model of a server cluster can be utilized in various networks, such as the Internet. The Internet was originally composed of fixed, i.e. wired networks, with stationary nodes. During the last decade of the 20th century the number and importance of mobile radio networks increased. At that timeframe radio networks have been incorporated into the Internet. The transmission capacity of radio networks is on average more limited than that of fixed networks, but the transmission capacity is not the only reason why the Internet protocols operate badly under radio networks conditions.
FIG. 1 depicts the Internet traffic in a fixed network and in a radio network. In this example application 1 uses the fixed network and identical application 2 uses the radio network. Both applications send three HTTP requests to the same node X, and concerning the same content. In response to the HTTP requests, both applications receive three transmissions from node X. Both application 1 and 2 and application 2 their first HTTP requests 11 and 12 simultaneously at time T0. Application 1 receives its last response transmission (13) on time T1, and application 2 receives its last response transmission (14) on time T2. The time period between T0 and T2 is about double compared to the between time moments T0 and T1.
FIG. 2 presents an essential reason why the Internet traffic is slower in a radio network than in a fixed network, though the transmission capacities of the said networks should in theory be equal. Time division multiplexing (TDM) is a multiplexing method in which each application is allowed to use a certain number of consecutive time slots (0–4 slots are commonly used) of a certain radio channel in a certain time period. This application will alternatively refer to such group of time slots as ‘bursts’. FIG. 2 illustrates transmissions related to one application. The application is allowed to transmit five bursts and the temporal starting points of these bursts are marked with SP1–SP5 respectively. For example, on starting point 1 the application obtains four time slots and on starting point 2 the application obtains two time slots. However, the application can utilize only partly the time slots and the transmission capacity related to them. Because of latency in a radio network, a part of the transmission capacity is wasted. Various reasons, which are known in the art, cause the latency. One reason is that a radio channel cannot be shared between several users as efficiently as a wire used in a fixed network. Another reason is that the Internet protocols are poorly suited for radio networks. More specifically, due to TCP protocol requirements for numerous handshakes between an application and a node before data exchange can occur, significant latency is imposed by the TCP protocol.
FIG. 3 shows a set of Internet protocols and OSI model. OS model is an international standard defined by the ISO (International Standard Organization). The seven layers of OSI are marked in the figure; an application layer (31) is the first one and a physical layer (32) is the seventh one. Hypertext transfer protocol (HTTP) (33) belongs to the application layer, Internet protocol (IP) (34) belongs to a network layer, and Point-to-point tunneling protocol (PPTP) (35) belongs to a data link layer. TCP (36) and User datagram protocol (UDP) (37) are both placed on the transport layer of OSI model. The arrows between protocols (33–37) demonstrate which protocols are needed when an application on the application layer communicate with equipment on the physical layer. For example, HTTP, TCP, IP, and PPTP are a set of protocols enabling the said communication. FIG. 3 depicts only some of the protocols that comprise the Internet protocol suite. It should be noted that TCP/IP does not easily lend itself to precise mapping to the ISO-OSI model. Thus, in these specifications the placement of the various TCP/IP protocols into the OSI model should be construed as loose examples, done more to enhance understanding of the invention rather than in a limiting way. FIG. 4 shows an example of a data packet and its headers. On the application layer the packet includes only data (41). On the transport layer a header TH (42) is added to the packet. On the network layer another header NH (43) is added to encapsulate the packet and the TH header (42). In the data link layer another header DH (44) is added. Then the packet, now containing all the above headers, is sent via the physical layer to its destination. At the destination, the same layers, only in reverse order, remove the corresponding headers, and operate to deliver the original packet to the appropriate application.
We have chosen to use general terms packet and header in this application to encompass the broad terms, such as the word datagram, that is commonly used to describe certain types of packets. It is also common to describe a header as a ‘frame’ as it encapsulates the data in the packet. As mentioned above, on the application level a packet includes only data. When the packet is handled according to a certain protocol, a header is typically added to it. The header determines how the packet will be handled at the receiver's end, or sometime includes addressing and other information.
FIG. 5 shows the data structure of an IP header (IPv4). The IP header (IPv4) is described in RFC (Request for Comments) 791 published by the Internet Engineering Task Force, at www.ietf.org. The IP header consists of six groups of 32 bits each. The source address (51), the destination address (52), and a field being termed “Option+Padding” (53) are 32-bit long. The other fields comprising the header data structure are shorter. On the sender's end an IP header is added to a packet. Correspondingly, on the receiver's end the IP header is read, and the packet is handled according to Internet protocol and the content of the IP header, after the header is removed. Handling of an IP header means, for example, that a destination address is read from the IP header and the packet is finally sent to the destination address. Since the packets are serially handled by different protocol levels, the collection of protocols are commonly referred as ‘protocol stacks’.
The IP header (IPv4) is just one example of headers. RFC 1883 describes another version of the IP header. Thus, each protocol is related to its own type or types of headers. TCP header and UDP header are other important headers relevant to the present invention, and are described in RFC 793 and RFC 768 respectively.
Amongst others fields, TCP headers include fields such as a source port, a destination port, a sequence number, an acknowledgement number, and a window. The source port serves to associate a packet with a sending process and a destination port similarly associates a packet with a receiving process. The sequence number field carries the sequence number of a transmitted packet belonging to a transmission stream. Thus, the packet receiver can detect if some packet is missing from the transmission stream. The acknowledgement field is used to indicate to the sending process that the receiving process indeed received certain packets, and optionally to cause retransmission of packets that arrived corrupted or that that are missing in the sequence. The window field carries the number of octets that the sending process is allowed to transmit before the next acknowledgement.
TCP is a connection-oriented protocol, which means that the protocol acts by establishing a ‘virtual connection’ between sender and receiver. The virtual connection is said to emulate a direct, wired connection between sender and receiver, and guarantees certain reliable data transfer characteristics. The period in which the virtual connection exists is called a session. The connection is established between the sending and the receiving processes. During the session the sender and receiver update sequence numbers, acknowledgement numbers, and window fields in exchanged packets. Once communication is completed the virtual connection is disconnected. Conversely, UDP is a connectionless protocol, i.e. no virtual connection between the sending application and the receiving Thus, the UDP protocol by itself does not provide a session similar to TCP.
UDP header includes only four fields which are termed a source port, a destination port, a length, and a checksum. The source port indicates a sending application and the destination port a receiving application.
Network Address Translation (NAT) is a common method of mapping address space operating between two communication networks. NAT equipment is any piece of equipment performing NAT functionality. Relating to the present invention, one network is the Internet and the other one may be any communication network, such as Local area network (LAN) or General packet radio services network (GPRS network). NAT equipment can be, for example, a firewall or a node termed Gateway GPRS support node (GGSN). NAT equipment maps the sender of a source network to the receiver of a destination network. The relation between the sender and receiver is termed a mapping. NAT equipment stores these mappings in a mapping table, and by using the mapping table, transfers packets from senders to receivers and vice versa.
FIG. 6 shows an example of a server cluster (601) containing a master node (602) and three slave nodes (603) (604) (605). The figure further includes two clients (606) (607), a communication network (608), the Internet (609), two Internet services (610) (611), and NAT (612). The server cluster operates between NAT and the Internet. The communication between the clients and the Internet services must be performed via the master node 602 and one or more of the three slave nodes.
In the prior art a client sees a server cluster as one entity having one IP address, i.e. Internet protocol address. The client communicates with the master node of the server cluster, for example, by using TCP when the connection can be termed a TCP connection. The TCP connection can be copied from one node to another, for example, by using a shared disk. In the other words, the master node stores the TCP connection data on the shared disk from where the data can be copied to a slave node. In practice, TCP and UDP are the only practical alternatives on the transport layer of OSI model so that either of them must be used in data transmission. UDP includes fewer handshakes than TCP and thus it is more efficient in data transmission. UDP is preferably used when transmitting video and/or audio streams.
In addition to the load balancing of a server cluster, there are other reasons for rerouting traffic. One reason is that a communication link to a certain node may be overloaded and the load of communication links should be balanced. Another possible reason is the desire to reroute traffic to a node advertising certain products. The prior art methods for rerouting traffic suffer from several drawbacks.
Firstly, as all traffic must initially flow to the master node prior to being directed to slave nodes, the master node communication link is open to overload, and is thus a limiting point on the whole cluster performance. Secondly, load balancing, or more generally, rerouting traffic must be performed for each packet of a UDP transmission, because UDP lacks a session feature. As mentioned above, TCP includes the session feature, thus rerouting of TCP traffic is performed only once for each session. Because rerouting of UDP traffic must be performed for each packet, it consumes a lot of processor time. The NAT equipment has a mapping table for mappings. If UDP traffic is rerouted, new mappings are needed, i.e. the size of the mapping table will increase. Each mapping creates one record in the mapping table and consumes one port. Therefore NAT equipment may run out of available ports, disabling the rerouting of traffic.
Thirdly, as UDP is not organized into session, tracking UDP load information is more complicated, and thus load balancing becomes more complex and less efficient.
It should be noted that in these specifications, a server is often referred to as a node, as it s being a node on the network. Thus for example a ‘master server’ is equivalently referred to as a master node. This distinction also shows that the word ‘server’ should not be construed to limit the invention to a computer server but also extends to other computing and networking equipment adapted to perform the node's respective function.