Clustering servers enables parallel access to data, which can help provide the redundancy and fault resilience required for business-critical applications. Clustering applications, such as HACMP™™ (High Availability Cluster Multi-Processing) provided by International Business Machines (“IBM”) of Armonk, N.Y., provide tools to help install, configure and manage clusters in a highly productive manner. HACMP™ provides monitoring and recovery of clustered computer resources for use in providing data access and backup functions (e.g., a mission critical database). HACMP™ also enables server clusters to be configured for application recovery/restart to provide protection for business-critical applications through redundancy.
Typically, in a High Availability Cluster, there is a group of loosely coupled nodes that all work together to ensure a reliable service to clients. The high availability is achieved by continuously monitoring state of applications and all the resources on which the application depends to be alive. If an application abnormally terminates or if the operating system suddenly fails then the applications are automatically restarted on the backup server. This process of restarting the application on a backup server is herein referred to as “fall-over”. When the network adapter or operating system fails, clusterware within the HACMP™ environment initiates an application fall-over during which, along with critical applications, the Internet Protocol (“IP”) address of the primary server used by the applications to communicate with the clients is also moved to the backup server. The clients generally reconnect to the same IP address which is now held by the backup server. Therefore, TCP/IP address is also considered as a highly available resource and is referred as a “service IP address”.
However, when the IP address of the primary server is moved to the backup server, the clients are unaware of this move until a TCP/IP timeout occurs. In other words, even though the primary sever has failed, the client continues to send TCP/IP packets to the primary sever. Eventually, the client determines that a reply has not been received after a timeout period has expired. The client then disconnects its current connection with the primary server and establishes a new connection with the backup server. This process is very costly for the service providers because of the lengthy downtime experienced before the client makes establishes a new connection with the backup server.
Therefore a need exists to overcome the problems with the prior art as discussed above.