1. Field of the Invention
The present invention generally relates to a computer system, and more particularly to a method and system for proactively reducing the outage duration by using the predicted outages to proactively trigger and manage existing failure recovery functionality.
2. Description of the Related Art
When a computer system suffers an unplanned failure, a certain amount of time is required to recover from the failure. If the computer is a single node, stand-alone computer system, it must reboot and restart its application. If the computer is part of a multi-node high availability cluster architecture, it must failover (i.e., transfer) the application to another node in the cluster.
During this recovery time, after either rebooting or failing-over the application to another node in a cluster environment, the recovering system must reload a stale copy of its state from disk, load a transaction redo log from disk, and attempt to reconstruct an up-to-date copy of that state by replaying that transaction redo log against the stale state.
Depending on the amount of state and the length of the log, this can take much time (e.g., on the order of hours). Thus, it is highly desirable to find means to reduce this outage time. However, prior to the present invention, no such means has been known.
The availability achieved by the system is a strong function of the time required to perform the recovery. If the outage recovery time is halved, the unavailability is also halved. Hence, if a system having an availability of 0.999 can halve its outage recovery time, then its availability climbs to 0.9995. Such an improvement can be a significant competitive advantage in an industry whose customers are showing an increasing concern for availability.
However, as mentioned above, prior to the present invention, no such system (or means) has been known or developed.