1. Field of the Invention
This invention is related to backup and restore of computer data in a cluster of computer systems.
2. Description of the Related Art
In many computer applications, high availability is a priority. For example, application servers provide service to clients over a network, such as a local area network (LAN), wide area network (WAN), the Internet, etc. Having the application servers available at all times (or as close to all times as practical) may be critical to the client's needs. If the application server belongs to a business and the clients are customers of the business, the business may lose sales of the application server is not available.
One mechanism for providing high availability is to cluster two or more computers using cluster server software. For example, the VERITAS Cluster Server™ line of products available from VERITAS Software Corporation (now owned by Symantec Corp.) may be used, although other software from other vendors exist. The cluster server software generally monitors operation of the computers in the cluster (often referred to as “nodes”), and may “fail over” an application server from one node to another to maintain high availability. Fail over may occur due to a failure (software or hardware) detected on the node being failed away from, or may occur to more even balance application load with then cluster.
Generally, application servers that execute on the cluster include one or more shared resources that need to be available to each node in the cluster in order for a given node to execute the application server. The shared resources may include files stored on a shared storage medium, as well as properties of the application server (e.g. an Internet protocol (IP) address assigned to the application server and used by clients to contact the application server, other Transport Control Protocol (TCP)/IP settings, etc.). In contrast, local resources may be resources on a particular node (e.g. files on the node, node properties, etc.).
Another aspect of ensuring high availability is to regularly backup the computer data, to avoid data loss in the event of a significant failure. For example, hardware or software failures may corrupt application data, requiring a backup copy to stem the loss of data. A power failure may bring down the whole cluster, preventing a fail over to keep the application server available. Other failures in the environment (e.g. natural disasters, attack, etc.) may require relocating the application server to an alternate site that is physically distant from the cluster (often termed a “disaster recovery” site). Typically, each node in the cluster is backed up and the application servers are also backed up. Restoring generally includes restoring each failed node in the cluster, reactivating the cluster, and then restoring the application servers onto the cluster. Part of restoring a node includes inhibiting a restore of the shared data for an application server that was on the node at the time of backup, in case the application server failed over to another (non-failing) node and is still executing.
Oftentimes, when a disaster occurs and relocation to the disaster recover site is needed, it is sufficient merely to get the application server running again. It may not be desirable to have a cluster at the disaster recovery site, either for cost reasons or to reduce complication at the disaster recovery site. However, since the backups were made from the cluster, it is not easy to restore the application server in a non-clustered environment and get it up and running. Typically, a great deal of manual work by a highly knowledgeable administrator is needed, lengthening recovery time and making the recover process error prone. Furthermore, if an application server was previously highly available but the costs of the cluster now outweigh the benefits, again there is no easy mechanism to consolidate the application server onto a single computer.