The present invention relates generally to high availability of services, and more particularly to a server cluster that achieves high availability via a hybrid volume protection that includes both sharing and mirroring a volume.
Most mission critical applications are dependent on programs and data which are stored on volumes of a hard disk device. When the hard disk device becomes unavailable, the mission critical applications cease to function properly, resulting in user downtime and lost productivity.
Server clusters and high availability (HA) software products exist that protect both mission critical applications and the data on which the mission critical applications depend. In general, HA software products move an application and all dependent resources from a first computer system of a server cluster to a second computer system of the server cluster in response to a failure of the first computer system. As a result of moving all dependent resources, the second computer system of the server cluster can continue to provide the application even after a failure of the first computer system.
In order to enable movement of the data resource between computer systems of the server cluster, existing HA software products generally protect the data resources on which applications depend by either (i) storing the data resources on a shared volume of a shared storage device, or (ii) mirroring the data resources from a first storage device to a second storage device. The shared volume approach generally requires that all computer systems of the server cluster have a direct connection to the shared storage device. By having a direct connection to the shared storage device, each computer system of the server cluster has direct access to the data resources of the shared volume; however, the shared volume is generally only owned by a single computer system of the server cluster at any point in time in order to maintain data integrity on the shared resource.
Due to the shared nature of the data resources, only a single copy of the data resources need be maintained on the shared volume in order for each computer system of the server cluster to have access to the data resource. Moreover, the HA software may move the data resources associated with an application by simply updating the ownership of the shared volume. While the direct connections to the shared storage device enable each computer system to share data resources, the direct connections also generally require that all computer systems of the server cluster be within a reasonably close proximity to one another. Accordingly, known server clusters implemented with shared volumes have precluded the possibility of using any of the computer systems of the server cluster for remote disaster recovery.
On the other hand, the mirrored volume approach generally requires that all computer systems of the server cluster maintain a separate copy or mirror of the data resources stored on the mirrored volume. As a result of maintaining a separate copy of the data resources, computer systems of the server cluster can be remotely located from one another. While the mirrored volume approach enables the possibility of using a computer system of the server cluster for remote disaster recovery, the mirrored volume approach is usually more difficult to configure because the mirror volume approach often involves network-related routing issues that arise when a service is relocated from one physical location on the network to another physical location on the network. Moreover, the mirrored volume approach requires more storage space since each computer system of the server cluster maintains a separate mirrored copy of the protected data resource.
Therefore, a need exists for a method and apparatus that implement a hybrid volume protection scheme in order to obtain a high availability scheme with many of the advantages of both shared volumes and mirrored volumes.
In accordance with one embodiment of the present invention, there is provided a method of providing high availability of a service. One step of the method includes allocating the service and a shared volume of a first mass storage device associated with the service to a first server of a first subcluster that is located at a first site and that includes servers which share the first mass storage device. Another step of the method includes mirroring the shared volume to a second mass storage device of a second subcluster that is located at a second site and that includes at least one server in order to obtain a first mirrored copy of the shared volume at the second site. Yet another step of the method includes determining to reallocate said service to a first server of the second subcluster. The method also includes the step of allocating the first mirrored copy to the first server of the second subcluster. Moreover, the method includes the step of allocating the service to the first server of the second subcluster in response to the step of determining to reallocate the service to the first server of the second subcluster.
Pursuant to another embodiment of the present invention, there is provided a server cluster for providing high availability of a service. The server cluster includes a first mass storage device located at a first site, a second mass storage device located at a second site, a first subcluster located at the first site, a second subcluster located at the second site, and a cluster manager. The first mass storage device includes at least one volume associated with the service. Similarly, the second mass storage device includes at least one volume associated with the service. The first subcluster includes a plurality of servers operably coupled to the first mass storage device. Moreover, the second subcluster includes at least one server operably coupled to the second mass storage device. The cluster manager is operable to allocate the service and the at least one volume of the first mass storage device to a first server of the first subcluster, and mirror the at least one volume of the first mass storage device to the at least one volume of the second mass storage device. Moreover, the cluster manager is operable to determine to reallocate the service to a first server of the second subcluster, allocate the at least one volume of the second mass storage device to a first server of the second subcluster, and allocate the service to the first server of the second subcluster in response to determining to reallocate the service to the first server of the second subcluster.
Pursuant to yet another embodiment of the present invention, there is provided a computer readable medium for providing high availability of a service. The computer readable medium includes instructions, which when executed, cause a cluster manager to allocate the service and at least one shared volume of a first mass storage device associated with the service to a first server of a first subcluster located at a first site and comprising a plurality of servers that share the first mass storage device. The computer readable medium also includes instructions, which when executed, cause the cluster manager to mirror the at least one shared volume to a second mass storage device of a second subcluster located at a second site and comprising at least one server in order to obtain a first mirrored copy of the at least one shared volume at the second site. Moreover, the computer readable medium includes instructions, which when executed, cause a cluster manager to determine to reallocate the service to a first server of the second subcluster, allocate the first mirrored copy to the first server of the second subcluster, and allocate the service to the first server of the second subcluster in response to determining to reallocate the service to the first server of the second subcluster.
It is an object of the present invention to provide a new method and apparatus for providing highly available services.
It is an object of the present invention to provide an improved method and apparatus for providing highly available services.
It is yet another object of the present invention to provide a method and apparatus which continue to provide a service even after a server fails.
It is still another object of the present invention to provide a method and apparatus which continue to provide a service even after a site failure has occurred.
The above and other objects, features, and advantages of the present invention will become apparent from the following description and the attached drawings.