Many data storage environments at various enterprises (e.g., companies, educational organizations, government agencies, etc.) operate without downtime on a 24/7 basis (available 24 hours a day, seven days a week). To enable recovery of data in case of failure, data backups are typically performed.
Traditional backup techniques involve backing up data to storage tape. For large storage systems, however, backing up to tape may not be viable, since there can be a relatively large amount of data that has to be backed up, which may cause an application (e.g., database application) in the storage system to be taken offline for a relatively long period of time. Taking the database application offline can cause disruption to the operation of the system, which is undesirable.
To address this issue, a zero downtime backup (ZDB) technique has been proposed, in which instead of backing up directly to storage tape, the data is backed up to a disk-based storage subsystem. Data writes to a disk-based storage subsystem is typically much faster than data writes to a tape storage device. By performing backups to a disk-based storage device, the database application would not have to be taken offline for a long period of time, such that there would be little impact on the performance of the database application. After the backup data has been written to the disk-based storage subsystem, the backup data can then be streamed to tape storage device without further interruption of the database application, or alternatively, the backup data can just be kept in the disk-based storage subsystem.
To provide additional data protection and to ensure high availability of a storage system, a clustered arrangement may be employed. The clustered arrangement includes a cluster of multiple storage sites. In case of failure of one storage site, failover can be performed to another storage site to enable continued operation. However, managing backups in a clustered environment can increase complexity. If not managed properly, then a recovery operation may not be possible or cannot be achieved in a timely manner. This can reduce availability of the data contained in the clustered environment.