The present invention relates to replication and, more specifically, to techniques for preserving consistency of passively-replicated non-deterministic objects.
It has become critical that the electronic systems on which society relies remain available and consistent. With respect to availability, a single instance of unavailability may tarnish or even cripple a company that relies on electronic commerce. With respect to consistency, there is a certain class of operations that should be performed if and only if other operations are also performed. For example, in a transfer of funds between bank accounts, the withdrawal of money from one account should occur if and only if the deposit of money into the other account also occurs.
The definition of consistency must be refined in the context of highly-available mission-critical applications. Online shopping, for instance, involves critical interactions between clients and the business application on the electronic commerce site. Unfortunately, when an error occurs during request processing (such as the failure of the client, the network, or the server), the client has generally no way to know whether his request has been processed or not.
Application servers generally provide at-most-once semantics, guaranteeing that, if the request has been processed, it has been processed only once. This quality of service is not sufficient since it puts the responsibility on the client to decide whether or not the request must be reissued.
The quality of service required by mission-critical distributed applications is exactlyonce. Ideally, the client wants the guarantee that its request will eventually be processed, and that processing will happen only once. This problem is known as end-to-end reliability. The xe2x80x9call-or-nothingxe2x80x9d property necessary for application consistency should ideally become xe2x80x9callxe2x80x9d.
End-to-end reliability can be defined as the guarantee that a request sent by a client to a server will be eventually processed and that the client will get a reply, despite the failure of any server component in the system. The request will be processed exactly once by the server. In addition, if the client fails it can still obtain the reply after recovering.
A typical scenario is that of an end-user buying airplane tickets through an electronic-commerce site. If the site fails before the user checks out, then he will have to start his selection over, but he will not be billed for what he had selected before the failure. However, if a failure occurs after the user has confirmed his purchase but before the reception of the receipt, then he cannot usually know whether the request has been processed or not, the site has to provide journalization mechanisms that can be assessed by end users.
In this scenario, end-to-end reliability means that the user has the guarantee that his request will be eventually processed once it has been submitted, if the client application (e.g., web browser) does not fail. If the client fails, then the request can be re-issued upon recovery with the guarantee that it will not be processed twice.
This scenario can be extended to involve nested invocations. For example, end-users can plan their vacations through a site that mediates the purchase of airline tickets, car rentals, and hotel reservations. This example illustrates the use of nested invocations between applications servers: the vacations site acts as a client of other sites for booking plane tickets, cars, and hotel rooms.
This scenario shows problems that may happen with partial request execution. Consider the case of an end-user who wants to travel to London. If the vacations planning server books a car and a hotel room in London but fails before obtaining the plane ticket, the first two reservations are meaningless. Note that, in this case, end-to-end reliability does not mean that all three reservations will succeed. It rather means that the client""s request will be processed completely. The vacations planner site may for instance cancel the car and hotel reservations if all planes to London are full.
Replication is a technique that is widely used to increase the availability of systems. In general, replication involves maintaining copies (xe2x80x9creplicasxe2x80x9d) of a resource so that if one replica fails, another replica may be used. For example, many clients may require access to a particular database table. To increase availability, many copies of the table may be managed by many different database servers. If one of the copies of the table becomes corrupt, or if one of the database servers fails, all clients that were accessing the copy of the table that is no longer available may continue to access the table using a different copy thereof.
A replicated object is represented by a set of copies. This set may be static or dynamic. Static replication requires that the number and the identity of the copies do not change during the lifetime of the replicated object, while dynamic replication allows copies to be added or removed at runtime.
In distributed systems, the two best-known replication policies are active replication and passive replication (also called primary-backup replication). With active replication, all copies of the replicated object play the same role. Thus, when a client sends a request to an actively-replicated object, all replicas of the object receive each request, process the request, update their state in response to the request, and send a response back to the client. Because the requests are always being sent to every replica, the failure of one of the replicas is transparent to the client.
With passive replication, one replica is designated as the primary replica, while all other replicas are backups. Clients perform requests by sending messages only to the primary replica, which executes the request, updates the other replicas, and sends the response to the client. If the primary replica fails, then one of the backup replicas takes over the responsibility of being the primary replica.
The main problem with replication is that conventional replication techniques require replicated objects to be deterministic. An object is said to be deterministic if the outcome of a request issued to the object (an xe2x80x9cinvocationxe2x80x9d of the object) depends only of the state of the object prior to the invocation and the parameters of the invocation. Thus, two deterministic objects having identical states will keep identical states if they both receive the same set of invocations in the same order.
The integrity of a system may be compromised if replicated objects are not deterministic. For example, in an active replication system, if two replicas arrive at different states based on the same input, then switching between the replicas may result in unpredictable behavior. In passive replication systems, the problem created by nondeterministic objects is less apparent, but just as troublesome, in particular when the nondeterministic objects interact with other entities.
FIG. 1 illustrates a scenario in which a client C invokes a single replicated object X, which in turn invokes another replicated object Y and then a non-replicated object Z. The set of nested invocations form an xe2x80x9cinvocation treexe2x80x9d. In this scenario, Y is aware of replication since it is replicated itself, while Z may not be aware of replication at all. Consequently, it may be assumed that only Y implements mechanisms for dealing with replicated invocations.
The main problem when dealing with replicated objects is to maintain the consistency of the replicated state, i.e., to ensure that all replicas agree on a common state. Consistency must be preserved at all levels of the invocation tree. For instance, in FIG. 1, it is not acceptable that Y receives and processes a request while Z does not because of the failure of X. The replication mechanisms must ensure that either all objects in the invocation tree process their request, or that none of them does. This all-or-nothing property is similar to the atomicity property of a transactional system.
One approach for guaranteeing atomicity of invocations in scenarios such as that shown FIG. 1 is referred to as the roll-forward approach. The roll-forward approach uses redundancy to ensure that another replica will transparently take over upon failure of the primary without any loss of information. Consistency is maintained by guaranteeing that the invocation will succeed despite failure (by xe2x80x9crolling forwardxe2x80x9d). Passive replication techniques that use a roll-forward approach ensure that only the primary replica processes the requests. Updates are sent to the backup replicas. If the primary replica fails during the processing of a request, a backup replica is chosen to be the new primary replica. The request during which the failure occurred is then sent to the new primary replica, which continues processing at that point.
It is commonly believed that a passive replication technique with a roll-forward approach is sufficient to support non-deterministic servers. However, these techniques are sufficient only for failure-free case, because a non-deterministic object may interact with other objects, the identity of which depends on non-deterministic factors. For instance, in FIG. 1, X may invoke Y if some condition is met (e.g., some timer has not yet expired) and Z otherwise. In this situation, X may crash after having invoked Y, but before having updated the backups. The backup that takes over and processes the invocation may invoke Z instead of Y and leave the system in an inconsistent state.
Unfortunately, in many real-world situations and systems, determinism cannot be guaranteed. Thus, there is a need for a system and technique for providing the benefits of object replication while maintaining the accuracy of results in an environment that cannot guarantee that the replicated objects are deterministic.
Techniques are provided for executing an operation in which a client invokes a replicated object. According to one technique, a primary replica of the replicated object receives a first request from the client, wherein the first request includes a request identifier. Rather than immediately attempting to process the request, the primary replica determines whether a record exists that corresponds to the request identifier. If a record exists that corresponds to the request identifier, then the primary replica responds to the first request with a reply associated with the record. If no record exists that corresponds to the request identifier, then the primary replica performs the steps of: starting a transaction; as part of the transaction, processing the request; as part of the transaction, storing a record associated with the request identifier and a reply to the request; committing the transaction; and delivering the reply to the client.
The client may itself be a replicated object. If the client is nondeterministic, then the transaction initiated by the primary replica may be a nested transaction relative to a transaction executed by the client, or be executed as part of the same transaction as the client. If the transaction executed by the primary replica is executed as part of the same transaction as the client, then a savepoint may be established before the primary replica processes the request, so that a failure will not necessarily require all processing performed by the client to be rolled back.