In disaster recovery solutions, high availability (HA) of the storage management servers is necessary in case of primary site failure. In a storage management product such as TotalStorage® Productivity Center for Replication (TPC-R), HA feature mirrors the state and data of the TPC-R server from a primary site to a secondary site. This feature creates the high availability aspect of the solution by keeping the brains of the TPC-R server consistent on two separate servers in case one of the TPC-R servers goes down.
The storage management server that is the primary in an HA relationship is referred to as the active server, while the secondary storage management server is referred to as the standby. If the primary storage management server goes down, then the secondary server can “takeover” the HA relationship between the servers to become the new active server. The primary storage management server still considers itself an active server. In order to even use the standby server, a user would have to first issue a takeover command; which would then allow commands to be accepted by the standby server since it would then be in an “active” HA state.
One drawback for this HA solution is that the customer has to initiate a “takeover” command to the standby in order for it to start working and managing their storage devices. This becomes a manual procedure in the event of a disaster that would require an administrator to perform. It was designed this way to protect from having two storage management servers managing the same set of storage devices at any one given time. Otherwise, either storage management server can possibly have errors if the alternate storage management server is trying to control the storage devices as neither would know what the other is doing.
Another drawback for this type of HA solution is that when the administrator wants to perform maintenance on the storage management servers, they will at some time have two storage management servers trying to manage the same set of storage devices. This will be a problem because the traditional active/standby relationship of the HA servers will not work when there are two active servers trying to control the data at the same time.
In existing active/standby HA state methodology, there could only be one storage management server controlling a set of devices in a data replication environment. If there were multiple “active” storage management servers controlling the same data replication devices, then each server would act independently of each other for different events. So, if there was a replication device pair removed from one of the storage management servers, the alternate storage management server would cause the rest of the replication pairs to suspend and stop copying data. This is undesirable since any new data would not be copied.
An embodiment of this invention addresses the above issues by providing for multiple active high availability storage management servers.