Current transactional recovery techniques for transactions performed via a client-server system typically use logs kept at servers to recover from system crashes. The use of these centralized logs, when needed, enables the rollback in time and the replaying of relevant operations.
Fault recovery techniques are typically sub-divided into phases: one phase that records information while a computer system is operating normally; and another phase that uses the previously recorded information to perform recovery after a computer-system failure.
During the recording phase, before committing to performing an operation, such as a transaction, the fault recovery system may “flush” or write log information to disk storage. Then, the system will be ready to commit to performing the transaction. Accordingly, if there is a failure while the transaction is being performed, the system can recover from the failure knowing whether or not the transaction was successfully completed. If the transaction has committed, then the transaction can be replayed during the fault recovery phase. If the transaction had not committed when a fault occurred, then during the recovery phase any operations that were performed before the fault occurred (as part of trying to perform the transaction) can be undone to get back to the state or condition of the system before the system started trying to perform the transaction.
Accordingly, with conventional fault-recovery processing for transactions, the transaction processing system begins in a consistent (i.e., known) state, and an all-or-nothing change of state is applied depending upon whether the transaction is performed or not. The transaction-processing system then ends up in a consistent state, namely, either the original (pre-transaction) state, or a new post-transaction state. This makes it possible to log, for example, bank transactions such as depositing money to a checking account.
In an existing client/server interaction for purchasing a book, for instance, the client computer sends a specific request to the server, the server processes it on behalf of the client and then returns the result of the purchase operation to the client. There might be several message exchanges to gather the necessary information. But once the server gets the necessary information from the client, the state of the operation, also referred to as the fault-recovery information for the operation, is logged at the server and the overall transaction is either committed or aborted. Thus, the log records that describe this operation are stored at the server. Typically, the client is not responsible for tracking or preserving this fault-recovery information. In general, client computers typically do not retain fault-recovery information regarding past interactions with various servers. Instead, the servers tend to preserve this type of information.
This conventional fault-recovery approach excludes clients from participating in the recovery of a fatal failure of a server. Were a server to be destroyed, the clients that have interacted with the server are typically unable to participate in the recovery of the server. Furthermore, conventional approaches to fault-recovery typically do not extend to pure peer-to-peer computing where there is no central server to preserve transaction state.
Accordingly, improved fault-recovery techniques that allow client computers to participate in fault recovery of a server and that apply to peer-to-peer systems would be desirable.