1. Field of the Invention
This invention pertains generally to communication over a network between computers in the presence of faults in those computers, and more particularly to the execution of connection-oriented communication protocols.
2. Description of Related Art
Computers use communication protocols executed by communication routines for exchanging information between them. An important class of communication protocols is the class of connection-oriented communication protocols that operate over an underlying network communication protocol. The most-widely used communication protocols in this class today comprise the Transmission Control Protocol (TCP) operating on top of the Internet Protocol (IP).
Connection-oriented communication protocols require one computer (i.e., the client) to initiate a connection to another computer (i.e., the server). Once the connection is established, the client and the server can exchange data. The connection remains established until both client and server endpoints terminate the connection, or one endpoint fails.
To achieve fault tolerance of the server, the server is replicated with a primary server replica and one or more backup server replicas, so the client continues to receive service, despite the failure of a server. If the primary server fails, a backup server takes over the role of the failed primary server and the client establishes a new connection to the backup server. The operations involved in the backup server taking over the role of the failed primary are referred to as a failover operation. There are several approaches that allow the client to use the same server address to connect to the backup server and, thus, mask the fact that the client is communicating with a different server. The masking of the failover operation without the client having to establish a new connection to the backup server and, without any modification to the client computer's software or hardware, is the subject of this invention and is referred to as transparent connection failover.
Systems have been proposed that allow the client to maintain an established connection with the server even if the server fails. However, they often require modifications to the network infrastructure, the client application or the client computer's protocol stack. Those systems suffer from the drawback that the network and the client computers often belong to organizations that are different from that of the server and, therefore, the client's computer software or hardware cannot be easily modified.
U.S. Patent Publication No. 20010056492 describes a system in which client-server TCP/IP communication is intercepted and logged at a backup computer. When the server fails, the server application is restarted and all TCP/IP stack activity is replayed. The backup computer performs an IP takeover, in which it takes over the role of the server computer for the remaining lifetime of the connection. No modifications to the client's TCP/IP protocol stack, the client application or the server application are required. To operate properly, the backup computer must be operational before the connection between the client and the server is established. Although the failover happens transparently to the client, the failover time can be significant because the entire history of the connection must be replayed.
TCP splicing (O. Spatscheck, J. S. Hansen, J. H. Hartman and L. L. Peterson, Optimizing TCP forwarder performance, IEEE/ACM Transactions on Networking, vol. 8, no. 2, April 2000, pp.146-157) is a technique that is used to improve performance and scalability of application-level gateways. Clients establish TCP connections to a dispatcher application. The dispatcher chooses an appropriate server to handle a client connection, and then modifies the TCP/IP stack of the dispatcher computer to forward all TCP packets of the connection directly to the selected server. No further involvement of the dispatcher is required until the connection is terminated. TCP splicing requires all traffic to flow through the dispatcher.
TCP handoff (M. Aron, D. Sanders, P. Druschel and W. Zwaenepoel, Scalable content-aware request distribution in cluster-based network servers, Proceedings of the USENIX 2000 Annual Technical Conference, San Diego, Calif., June 2000, pp. 323-336) removes the dispatcher by letting the client connect directly to one of the servers. If the initial server decides that another server is better suited to handle the connection, it transfers the TCP connection state to an alternate server. TCP handoff requires a special front-end layer-4 switch that routes the packets to the appropriate server.
TCP migration (A. C. Snoeren, D. G. Andersen and H. Balakrishnan, Fine-grained failover using connection migration, Proceedings of the USENIX Conference on Internet Techniques and Systems, San Francisco, Calif., March 2001, pp.221-232) is a technique that is transparent to the client application but requires modifications to both the client and server TCP/IP stacks. Modifications to the network infrastructure (e.g., Internet routers, underlying protocols) are not required. The client or any of the servers can initiate migration of the connection. At any point in time, only one server is connected to the client. Multicasting or forwarding of the client's message is not possible.
Other researchers (F. Sultan, K. Srinivasan, D. Iyer and L. Iftode, Migratory TCP: Connection migration for service continuity in the Internet, Proceedings of the IEEE International Conference on Distributed Computing Systems, Vienna, Austria, July 2002, pp. 469-470) propose a TCP connection migration scheme that requires the cooperation of both the client and server TCP/IP stacks. The client initiates the migration. During the migration process, both servers must be operational, which renders this approach appropriate for load balancing but not useful for fault tolerance.
The Hydranet system (G. Shenoy, S. K. Satapati and R. Beftati, HydraNet-FT: Network support for dependable services, Proceedings of the IEEE International Conference on Distributed Computing Systems, Taipei, Taiwan, April 2000, pp. 699-706) replaces a single server with a group of server replicas. It does not require any modification of the client's TCP/IP stack. Instead, all IP packets sent by the client to a certain IP address and port number are multicast to the group of server replicas. For this scheme to work, all traffic must go through a special redirector, which resides on an Internet router. To maintain consistency between the server replicas, the system employs an atomic multicast protocol. The forwarding service is not restricted to TCP, but can accommodate any transport protocol that is based on IP.
The SwiFT system (H. Y. Huang and C. Kintala, Software implemented fault tolerance, Proceedings of the IEEE Fault Tolerant Computing Symposium, Toulouse, France, June 1993, pp. 2-10) provides fault tolerance for user applications. SwiFT consists of modules for error detection and recovery, checkpointing, event logging and replay, communication error recovery and IP packet rerouting. The latter is achieved by providing a single IP image for a cluster of server computers. Addressing within the cluster is done by Media Access Control (MAC) addresses. All traffic from the clients is sent to a dispatcher, which forwards the packets to one of the server computers. A client must run the SwiFT client software to reestablish the TCP connection if the server fails.
Rerouting of IP packets (A. Bhide, E. N. Elnozahy and S. P. Morgan, A highly available network file server, Proceedings of the 1991 USENIX Winter Conference, Dallas, Tex., January 1991, pp. 199-205) is proposed in a scheme that reroutes IP packets from a primary server to a backup server. If the primary server fails, the backup server changes its IP address to the address of the primary server. The backup server then sends a gratuitous Address Resolution Protocol (ARP) request to announce that it can now be found at the primary's address. From then on, all IP packets that are addressed to the primary server are sent to the backup server.
Replication of Web services (N. Aghdaie and Y. Tamir, Client-transparent fault-tolerant Web service, Proceedings of the IEEE International Conference on Performance, Computing and Communications, Phoenix, Ariz., April 2001, pp. 209-216) is used in a system that allows a client to continue to use a TCP connection transparently when the primary server fails. This approach does not require changes to the hardware or software infrastructure but, rather, uses two proxies at each server that are implemented in user space to avoid changes to the operating system of the server computer. The server application is passively replicated, and the backup proxy logs client requests and server replies. The drawback of their approach is the degraded performance that results from the context switches and protocol stack traversals that are needed for an implementation in user space.
Therefore, a need exists for a method of maintaining a network connection between a client and a replicated server without the need for the client to establish a new connection if one of the servers fails and without the need for any modifications to the application code, communication routines or other hardware or software infrastructure at the client, so that the connection failover is transparent to the client. The present invention satisfies those needs, as well as others, and overcomes the deficiencies of previously developed methods for providing network connection failover.