The invention relates generally to computer network servers, and more particularly to computer servers arranged in a server cluster.
A server cluster is a group of at least two independent servers connected by a network and managed as a single system. The clustering of servers provides a number of benefits over independent servers. One important benefit is that cluster software, which is run on each of the servers in a cluster, automatically detects application failures or the failure of another server in the cluster. Upon detection of such failures, failed applications and the like can be terminated and restarted on a surviving server.
Other benefits include the ability for administrators to inspect the status of cluster resources, and accordingly balance workloads among different servers in the cluster to improve performance. Dynamic load balancing is also available. Such manageability also provides administrators with the ability to update one server in a cluster without taking important data and applications offline. As can be appreciated, server clusters are used in critical database management, file and intranet data sharing, messaging, general business applications and the like.
Thus, the failover of an application from one server (i.e., machine) to another may be automatic in response to a software or hardware failure on the first machine, or alternatively may be manually initiated by an administrator. In any event, to failover an application in a manner that is transparent to the application and to the client requires that the application""s execution environment be recreated on the other machine. This execution environment comprises distinct parts having different characteristics from one another, a first part of which is the application code. The application code changes very rarely, and thus an application""s code environment may be replicated either by installing the application on all of the machines which may run in a cluster, or by installing the application on storage that is shared by all machines in the cluster. When an application needs to be restarted, the exact code is thus available to the cluster.
Another part of the execution environment is the application""s data, which changes very regularly. The application""s data environment is best preserved by having the application store all of its data files on a shared disk, a task that is ordinarily accomplished by inputting appropriate information via the application""s user interface. When an application needs to be restarted, the exact data is thus available to the cluster.
A third part of the execution environment is the application configuration information, which changes occasionally. Applications that are xe2x80x9ccluster-awarexe2x80x9d (i.e., designed with the knowledge that they may be run in a clustering environment) store their application configuration information in a cluster registry maintained on a shared disk, thus ensuring reliable failover.
However, existing applications that are not cluster-aware (i.e., legacy applications) use their local machine registry to store their application configuration information. For example, Windows NT applications use the WIN32 Registry. As a result, this configuration data is not available to the rest of the cluster. At the same time, it is impractical (and likely very dangerous) to attempt to modify these legacy applications so as to use the cluster registry instead of their local registry. Moreover, it is not feasible to transparently redirect each of the local registries in the various machines to the cluster registry, and costly to replicate copies of each of the local registries to the various machines. Nevertheless, in order to ensure correct and transparent behavior after a failover, the application configuration information needs to be recreated at the machine on which the application is being restarted.
The present invention provides a method and system for transparently failing over resource configuration information stored by a resource (such as an application) on a local machine. More particularly, the application configuration information written to a registry of a local machine is made available to other machines of the cluster. The other machines can rapidly obtain this application configuration information and use it to recreate the application""s execution environment on another machine in the cluster, ensuring a rapid and transparent failover operation.
Briefly, the present invention transparently fails over a legacy application by tracking and checkpointing changes to application configuration information that is stored locally, such as in a system""s local registry. When an application running on the first system makes a change to the application configuration information in a subtree of the registry, the change is detected by a notification mechanism. A snapshot mechanism is notified, takes a snapshot of the subtree""s data, and causes it to be written to a storage device shared by systems of the cluster. When the application is failed over to a second system, the snapshot for that application is retrieved from the quorum disk by a restore mechanism and written to the registry of the second system in a corresponding subtree. The application is then run on the second system using the restored application configuration information for that application.
Other benefits and advantages will become apparent from the following detailed description when taken in conjunction with the drawings, in which: