The invention relates generally to the field of digital data processing systems and more particularly to mass digital data storage subsystems. The invention provides a system and method for performing backup of data stored in a mass storage subsystem.
Digital computer systems are used in a number of applications in which virtually continuous availability of data is important to the operation of businesses or other entities using the systems. Generally, computer centers will periodically produce back-up copies of data on their various digital computer systems. Such back-up copies are usually not maintained on a continuous basis, but instead at particular points in time, often at night when the normal processing load on the computer centers is reduced and modifications to the data being backed up may be minimized, and in any case represent the data at the particular points in time at which the back-up copies are generated. Accordingly, if a failure occurs between back-ups, data which has been received and processed by the digital computer systems since the last back-up copy was produced, may be lost. Typically, such back-up copies will be maintained by the computer centers at their respective sites so that they may be used in the event of a failure, although some off-site archival back-ups may be maintained. Significant additional problems arise in the case of, for example, catastrophic events that can occur, such as may result from, for example, fire, flood or other natural disasters, intentional tampering or sabotage and the like, which may result in unintentional or intentional damage to an entire site or some significant portion thereof, since some or all of the back-up copies may also be damaged and the data contained thereon may be unavailable.
Several backup strategies have been developed. In one strategy, software which maintains and controls the data to be backed up, such as database software, initiates and performs the backup operation. In such an arrangement data, generally in the form of incremental changes to a database, is provided by the database software to a backup management software, which stores the data on a backup device. One advantage of this strategy is that, since only incremental changes are backed up, less data needs to be backed up at any point in time. A disadvantage is, however, that although less data is copied with this strategy, a load is still exerted on the production system processing the database software.
In a second strategy, backups are performed outside the database software. In this strategy, data files are backed up independently of the database software. While this minimizes the load on the production system processing the database software and can result in relatively high-speed backups of full data files, the backup and restore operations do not make use of the facilities that are currently provided in commercial database software.
U.S. patent application Ser. No. 08/820,912, filed Mar. 19, 1997 in the name of Philip Tamer, et al., entitled RDF-Based and MMF-Based Backups, (hereinafter xe2x80x9cthe Tamer applicationxe2x80x9d) assigned to the assignee of the present application, discloses another strategy. In the strategy described in the Tamer application, a data storage subsystem stores data in mirrored form, that is, it stores several copies of the data within the single data storage subsystem. Normally, when a particular item of data is modified, the data storage subsystem updates the data item in all of the copies so as to keep all of the copies coherent and in synchronization. During a backup operation, the data storage subsystem essentially de-links the copies, using one copy for data accesses by the database software and the other copy for backup. During the backup operation, modified data items are only stored in the copy that is used for data accesses. A xe2x80x9cmodified data itemxe2x80x9d record is maintained for each data item that is modified during the backup operation. After the backup operation, the xe2x80x9cmodified data itemxe2x80x9d records are processed to update the copy used for the backup operation to make the two copies identical. This is done by copying each modified data item from the copy used for data accesses to the copy used for the backup operation.
The invention provides a new and improved system and method for backing up data stored in multiple mirrors on a mass storage subsystem under control of a backup server.
In brief summary, a backup server in one aspect controls the backing up of data stored on a mass storage subsystem in response to a backup request from a host identifying data to be backed up during a backup operation, the mass storage subsystem storing data in a plurality of mirrored copies. The backup server comprises a discovery module, a preparation module, an execution module and a clean-up module. The discovery module receives the backup request and identifies, during a discovery phase, at least one storage location on the mass storage subsystem on which data to be backed up during the backup operation is stored. The preparation module, during a preparation phase following the discovery phase, enables the mass storage subsystem to sever one of said mirrored copies and make it available to backup server for the backup operation. In addition, prior to enabling the mass storage subsystem to sever one of the mirrored copies, the preparation module will notify the host, which will stop operating in connection with the data from the mass storage subsystem, and after the mirrored copy has been severed, the preparation module will so notify the host so that it can resume operating in connection with data from at least one of the other copies. The execution module, during the execution phase, enables the mass storage subsystem to retrieve data from the at least one storage location and transfer the retrieved data to the backup server to facilitate backup storage. The clean-up module, during a clean-up phase following the execution phase, verifies that the data to be backed up has been stored in backup storage and enables the mass storage subsystem to re-synchronize the mirrored copies.
A benefit of the use of a backup server, separate and apart from a host, for controlling backup during a backup operation, is that the host is relieved of the burden of managing backup operations, which can enhance throughput by the host.
In another aspect, the backup server controls the restoration of data on a mass storage subsystem in response to a restore request from a host identifying data to be restored during a restore operation. In that aspect, the discovery module receives the restore request and identify during a discovery phase at least one storage location on the mass storage subsystem on which data to be restored during the backup operation. The preparation module, during a preparation phase after the discovery phase, notifies the host that the backup server is in condition to enter an execution phase. The execution module, during the execution phase, enables the mass storage subsystem to receive data from backup storage and store it on the at least one storage location to facilitate restoration. The clean-up module configured to, during a clean-up phase following the execution phase, verify that the data to be restored has been stored on the at least one storage location and, if so, enble the mass storage subsystem to re-synchronize the mirrored copies and notify the host that the restore operation has completed.