1. Field of the Invention
This invention relates to fault tolerant systems, and particularly to methods, systems and computer program products for architecting fault tolerant applications.
2. Description of Background
Fault-tolerance is a computer system property that enables the system to continue operating properly in the event of the failure of some of its components. As such, if the system's operating quality decreases at all, the decrease is proportional to the severity of the failure. Typically, fault tolerance is achieved by generic techniques which apply to all the applications running on the system. Such techniques may include: a disaster recovery system; fault tolerant system software; kernel support for fault tolerance; and hardware support for fault tolerance.
However, currently existing solutions to fault tolerance behave in a generic way to handle the system's fault. Currently, an application program relies on generic methods of fault tolerance, such as on the underlying operating system and firmware to save itself when a problem occurs in the system. As such, the application program specifically can't do anything to tolerate the fault. For example, suppose an application program uses an Ethernet card for network connection. After some time the Ethernet card does not work properly. Typically, the OS fails over to a new (redundant) Ethernet card and gives the new Ethernet card the same identity as the failed Ethernet card. This operation happens transparently to the application program. The application program keeps on running even in the event of an Ethernet card failure. Here, the fault tolerance is provided to all the applications running on the system. This method of handling the fault it a generic one irrespective of type of application programs.
As such, an application programmer for the system is left with no option other than to rely on the generic mechanism to handle the fault for the application. In addition, the application programmer has no way of designing a better way of handling the fault for what the application programmer believes to be more appropriate in the context of the particular application. It would be desirable to have a fault tolerant system to enable the application programmer with some programming structures which help in designing better way of handling the fault for the application without blindly relying on the generic solutions.