1. Field of the Invention
This invention relates to enterprise system management and, more particularly, to continuous availability techniques in multi-server networked environments.
2. Description of the Related Art
The impact of system downtime on productivity is increasing as organizations rely more heavily on information technology. Consequently, organizations may seem to minimize downtime through various approaches designed to increase reliability and availability. Ultimately, the goal of many organizations is to ensure the continuous availability of critical systems.
One approach to continuous availability is the use of redundant hardware executing redundant instances of an application in lockstep. If one instance of an application on one unit of hardware fails, then the instance on the other unit of hardware may continue to operate. However, the redundant hardware is often proprietary, and both the redundant and proprietary natures of the hardware yield a cost that may be prohibitive.
To avoid the expense of special-purpose hardware, software techniques may be used to provide failover of an application. For example, cluster management software may support application failover in a networked environment having two or more servers and a shared storage device. If the failure of an application or its host server is sensed, then a new instance of the application may be started on a functioning server in the cluster. However, software-based failover approaches may fail to preserve the entire context of the application instance on the failed server up to the moment of failure. In the wake of a failure, the new instance of the application is typically started anew. In the process, recent transactions and events may be discarded. Other transactions and events may be left in an indeterminate state. The server or its clients may need to initiate new connections to replace connections lost in the failover.
Debugging software has used techniques for the logging and replay of events encountered by an application. For example, a debugger may log events occurring during execution of a first instance of an application. The debugger may then replay the logged events from the beginning by means of instrumentation of the application, typically using recompilation or other techniques prior to replay. However, recompilation may not be available for off-the-shelf application software, and static instrumentation may often yield an unacceptable performance penalty for software in a production environment. Furthermore, replay of logged events may be performed inefficiently in a debugging context.
It is desirable to provide improved methods and systems for continuously available execution environments.