Data processing systems have conventionally comprised two major elements, a host computer comprising an arithmetic and logic unit, including a main memory unit in which is stored the software or programs being run, and a data storage system, in which the data, and, in some cases, the programs as well, are stored when not actually being used. Historically, the market has developed in such a way that the evolving of new systems has proceeded piecemeal; that is, improvements in host computers are only commercially viable when they do not require the discarding of old data storage systems. Similarly, improved data storage systems are primarily useful when they can improve the performance of a given host computer without requiring its replacement. The present invention is of the latter type, in that it provides improvements in efficiency of operation of a large data processing system without requiring substantial modification or replacement of a host computer.
It will be recognized by those skilled in the art that a common problem in the data processing industry is the storage over long periods of time of the vast amounts of data which are accumulated in data processing operations, in a way which is as inexpensive as possible, while being reliable. A number of commonly assigned applications have been filed for devices directed to this end. For example, copending Ser. Nos. 389,295, now abandoned and 384,381, now U.S. Pat. No. 4,467,421 as well as other applications, relate to a so-called Virtual Storage System (VSS). The device described in these applications comprises what in other circumstances might be considered an independent processor, operating outboard a conventional "channel", as that term is used in connection with the computer systems sold by IBM Corporation. The VSS processor is used to divide data sets which are supplied by the host computer into subportions sized to fit conveniently onto disk and tape storage media, and to maintain a directory of the locations of these subportions, such that upon their subsequent recall, the data set can be reassembled and presented to the host computer, all without the host computer's requiring detailed information as to the actual storage locations or involving the user in any significant way. The virtual storage system as described in those applications also includes the possibility that the intelligence of the outboard storage subsystem processor can be used to copy data sets from the relatively expensive disk memory to relatively less expensive tape memory for long-term archival storage, so that if, for example, a data set stored on disk is destroyed by some mischance, an accurate copy of the data set has been made on tape and can be readily supplied for replacement of the data set lost. This approach is workable and does perform the data back-up and recovery operations automatically; that is, without CPU intervention.
However, the data storage system described in those application has a few drawbacks. It requires substantial additional computing power and hence expense. Furthermore, the embodiment now available is limited to only physically sequential data sets. Perhaps more significantly, the data set is not available while it is being copied from disk to tape. Finally, its software operates independently of any data management software which may be desired on the host computer. This can cause potential access conflicts.
Other prior art approaches are even less useful than that just described. For example, basic back-up operations as previously performed simply involve using the CPU itself as the conduit by which data flows from disk to tape for back-up. This is highly undesirable because it consumes valuable CPU time to perform an essentially simple copying function. Clearly, this cuts into the availability of the host computer, the most valuable resource of any data processing system. Furthermore the data set being backed-up is not available, thus delaying work on programs using that data set. Moreover, typically these back-up operations are performed as a batch job, once per day. If a data set is modified more than once a day, at least at some point the back-up copy is out of date.
Nor has this problem been addressed by commercially available host computer software. IBM's data base management program, referred to as "HSM" for Heirarchical Storage Manager, does perform some back-up operations automatically, but again uses the CPU to run this program, rendering it unavailable, and does not provide a way in which the data set being backed-up can be available simultaneously for other purposes.
Ideally then, there are four particularly important requirements of a good back-up and recovery system. The back-up operations performed should be timely, in order to preserve integrity of the data as often as it is changed, and so that the backed-up data set is always up to date. Second, the data should be available to other programs while the back-up operation is taking place, in order to minimize processing delays. Third, the impact of performing back-up operations on the CPU as well on job through-put should be minimized in order that the back-up does not interfere unduly with other processing operations. Finally, the recovery operation, the copying of the data from tape back to disk when the data set on disk has failed for some reason, should be manageable and performed easily by an operator, rather than require a long sequence of complex operations which would tend to lead to error.