The present invention relates generally to fault tolerant distributed computing systems, and in particular, to a method for dynamically switching fault tolerance schemes in a distributed system based on wait times of user interface events.
Fault tolerance is a key technology in distributed systems for ensuring reliability of operations for user critical applications such as e-commerce, database transactions and B2B, etc. A distributed system is a group of computing devices interconnected with a communication network which function together to implement an application. Fault tolerance provides reliability of operation from the user's perspective by masking failures in critical system components. Known fault tolerant mechanisms for distributed systems can use different fault tolerance schemes, including different fault detection and recovery means, to handle various types of failures, such as device and network failures.
However, it is known that fault tolerance schemes may have different fault tolerance and performance trade-offs. In the context of interactive applications, fault tolerance schemes can have an adverse effect on the time that a user has to wait for a system response once the user interacts with the system, particularly in mobile computing environments. This delay can affect user perception of the performance of a system, which is significant because users are known to give up on applications if their requests are not met within certain time limits. Accordingly, it is desirable to limit detrimental trade-offs between fault tolerance and perceived system performance.
Furthermore, different applications may have different requirements for fault tolerance and performance. In addition, these requirements may change over the course of execution of the same application. It may be that no particular implementation of a fault tolerance mechanism will perform well for all applications. In this context, it is important to know when to switch fault tolerance schemes and which scheme to dynamically select.
Therefore, there is a need for a method of dynamically switching fault tolerance schemes that can improve the user perceived performance of a system while taking into account the desired level of fault tolerance.