1. Field of the Invention
The present invention relates generally to the field of computer mass storage systems such as multiple disk systems and libraries and more particularly to making time available for duplicating storage system contents for backup, recovery and other purposes without disrupting the business applications using the storage systems.
2. Background
More and more businesses have computer system operations that must continue to operate nearly continuously 24 hours a day, 7 days a week, 365 days a year, with little or no interruption to service, loss of system availability, or loss of data in mass storage. Many of these types of operations involve the use of huge databases, data sets or files stored on multiple disk systems. When files and data sets were only a few hundred thousand bytes or even a few megabytes in size, they could be backed up (read and copied in their entirety) in a few minutes. If a test update of the file caused errors in the new version, the old status quo could be restored to the storage system from a backup tape or disk in a few minutes. Similarly, updating the file often took only minutes. However, as disk capacity, and then multiple disk system capacity, such as that provided by Redundant Arrays of Independent Disks (RAID) systems and Hierarchical Storage Management (HSM) Systems, made it possible to store gigabytes of data, and then terabytes of data in larger and larger databases and data warehouses, tasks such as backup and restore, creating alternative versions of the data for new version testing, and so on can now take 6-12 hours or more to accomplish, even on powerful mainframe computer systems. For businesses that must be in continuous operation around the clock or nearly so, the sheer amount of the data that must be backed up for such purposes presents a difficult time problem.
If such a business application has to be stopped so that a backup can be made, six to twelve hours or more could be lost. Many of today's business applications cannot be halted or disrupted for this length of time. On the other hand, if the business applications are operating constantly and backups are not made, data security can be jeopardized if disasters or system failures destroy the current contents of databases and files. Thus, while backups become more critical for disaster recovery programs and alternative testing needs, time becomes less and less available for them in a production schedule.
At the same time that the inherent capacity of mass storage systems has been increasing dramatically, a number of other factors that provide reliability benefits to the system user may sometimes complicate the backup problems. For example, the likelihood of disk failures and data loss have been significantly reduced by RAID disk formatting and recording techniques such as mirroring, in which simultaneous mirror copies are created on disk of data sets and databases being updated. In the event of a failure of one disk, its mirror can be used immediately by the production business application program. Other RAID formats also help lower the risk of online disk failures and data loss. Applicants's Assignee EMC Corporation's Symmetrix 5xxx systems provide such reliability and availability benefits to users. Many of the business applications that require continuous operation and use of large databases also use mainframe computers, such as IBM's System 390 series computers and its MVS operating system. These provide additional availability and reliability features such as multiple paths to data sets, sophisticated disk control and access software such as System Managed Storage (SMS), VSAM, and data set control and cataloging features.
The advantage of using such technology is an improved ability to keep the business applications operating almost around the clock. However, even with mirrored disk volumes and alternate data paths, system administrators still need to make backup copies of databases and data sets for several different purposes. One such purpose is, as mentioned, disaster recovery. If the data center where a mass storage system is located is damaged or destroyed by an accident or natural disaster, the system administrator should be able to recreate all the key databases and data sets elsewhere from backups, so that business can be resumed as quickly as possible.
Another purpose for backups is to enable testing of new software or new versions of software. One of the best ways to complete the testing of new software features for a major business application is by using “live data” or close to it, but without risking the actual production data. Traditionally, this has been done by using a backup version of the real data file. A backup version is essentially another copy of the data on another storage device or devices.
Continuous or nearly continuous business operation by definition means that the business application should not be stopped, or quiesced, if possible, even to allow a full backup to be made, especially if the amount of data means backups will take hours to perform. The makers of database programs for large files have attempted to address the problems of backing up and restoring data by using incremental backups and transaction logs, that allow the user to make one “big” backup periodically and several smaller ones that only reflect what has changed. These may also be used in connection with transaction logs that let the database software recreate changes since some last specified incremental backup.
Even so, backups such as these can still take hours when the files are big enough. They are also limited to specific database or application programs. Legacy applications (programs originally written years or even decades ago but still in production use on computers) using large files may not have access to such programs.
One technique, known as a “side file” has been used by Above Technology and Veritas to address part of the problem. In this approach, instead of updating the main file, the host computer has a special driver that creates a separate file, called the side file, and copies data to be written to it, instead of to the main file. When the side file fills up, the contents of the side file can be copied into the main file and then the side file is reused. This can still cause some interruption to the main business application program however, in order to copy the side file to the main file. It is also usually not a complete backup, but only a partial one.
Another approach is a technique known as log-structured files. In this approach, a log-structured file storage system typically writes all modifications to disk sequentially in a log-like structure. This speeds up file writing and crash recovery. In this approach, the log usually has index-like data so that files can be read back from the log efficiently. All writes typically go to the end of storage. If the storage system saves the old blocks and a copy of all the pointers, it has a snapshot of the prior state before a write operation. Thus, the old view serves as a partial backup.
IBM in its RAMAC virtual array device also uses a log file structure to create a snapshot of the data. In this approach, a snapshot is simply the creation of a new view of the volume or data set being “snapped” by copying the pointers in the log file structure. None of the actual data is accessed, read, copied or moved. Any updates that are made to the snapshot will be effective for that view of the data; any other views remain unchanged.
These tend to be partial solutions, however. They may not be compatible with existing structures and control mechanisms in the MVS operating system or others or with the principal data set and access method formats used. They also do not provide a complete and convenient system for backing up and restoring data in an MVS mainframe environment without halting or disrupting the business operation at some point.
It is an object of this invention to provide a way to backup and restore data in a mainframe computer environment without halting or disrupting the business applications in process.
It is another object of the present invention to provide a method and apparatus for backing up and restoring data in mass storage systems that is compatible with the MVS operating system's control structures, storage formats and access methods.