A cluster is a set of application server instances, running on independent servers, configured to act in concert to deliver greater scalability and availability than a single instance can provide. While a single application server instance can only leverage operating resources of a single host, a cluster can span multiple hosts, distributing application execution over a greater number of CPUs. While a single application server instance is vulnerable to the failure of its host and operating system, a cluster configured for high availability (HA) continues to function despite the loss of an operating system or host, hiding any such failure from clients.
In enterprise applications that are clustered, servers may be configured to broadcast states (sessions) for in-memory replication. This ensures that when one server goes down, the clients will immediately redirect to the server that has backed-up all the other states. Hence the transactions are continued without interruption.
While the replication process is relatively cheap for two servers that back each other up, it is time consuming for a configuration with three or more servers. In prior systems, when a server receives a state update request from a client, it broadcasts its latest states and waits for acknowledgements from all other servers in the cluster. However, since the broadcast protocol is usually using the datagram protocol (which is unreliable), messages are occasionally dropped and the states need to be broadcasted again.
As the number of servers in a cluster increase, the probability of dropped messages also increase. As a result, the latency, which is the time between a client/server request, also increases. This is unacceptable for certain applications when the client/server response time is critical. In one preliminary test, the message drop rate (error in data or lost data) was about 20%. Of course, this percentage fluctuates in different environments and situations. So in a three server configuration, each server will broadcast and wait for the other two servers and each independently has a 20% chance of message drop. Hence the probability of a message re-send (and client/server latency) increases drastically with the number of HA servers in the cluster—the more redundancy configured in the cluster, the more latency between the client and the server is observed in such configuration.
The present invention provides a new and useful method and system of object replication that addresses the above problems.