Database recovery alone is insufficient for masking failures to applications and users. Transaction atomicity merely guarantees all-or-nothing but not exactly-once execution of user requests. Therefore, application programs need to have explicit code for retrying failed transactions. Often such code is incomplete or missing, and then failures are exposed to the user. Or even worse, a failure occurs with no notice provided, which can occur if the system executing the application crashes. For an e-commerce service, such behavior is embarrassing, and also inconvenient to the user. On the other hand, the application program or the user must not blindly re-initiate a request even if no positive return code has been received, as the request may nevertheless have succeeded. For this reason, some e-services warn users to be careful about not hitting the checkout/buy/commit button twice even if there appears to be a long service outage from the user's viewpoint.
Fault-tolerance for systems of communicating processes has been studied. However, the primary focus has been on long-running computations (e.g., in scientific applications) with distributed checkpointing to avoid losing too much work by failures. With respect to the state exposure that is inherent in message exchanges with human users, these aspects are addressed by “pessimistic logging” which involves forced log I/Os for both sender and receiver upon every message exchange. Similar, and sometimes even more expensive techniques such as process checkpointing (i.e., state installation onto disk) upon every interaction, have been used in the pioneering industrial projects on fault-tolerant business servers in the early 1980s. The current “fail-safe” solutions are limited in that either they require explicit application code for failure handling, require stateless components, or they are incapable of handling failures at all levels of a general multi-tier application.
In view of the foregoing, there is a need for systems and methods that overcome the limitations and drawbacks of the prior art.