1. Field of the Invention
This invention generally relates to storage devices for use in data processing systems and more particularly to a system that enables a magnetic disk storage device to emulate a magnetic tape storage device.
2. Description of Related Art
Data centers that process and maintain large quantities of data generally include two types of mass storage devices, namely: magnetic disk storage devices and magnetic tape storage devices. Both types of mass storage devices typically operate in large-scale, multiple-processor systems. These systems further include sophisticated operating systems for controlling various resources connected to one or more central processors. The Multiple Virtual System of IBM (commonly called "MVS") is one such system.
Data centers operate with different configurations that may include certain magnetic disk storage devices organized as primary storage devices. Other magnetic disk storage devices may act as mirrors or may act as redundant storage devices to provide instantaneous backups. In a redundant configuration, data overwritten to the primary storage is immediately overwritten to the redundant magnetic disk storage device so no historical record is maintained of different versions of a system.
Typically magnetic disk storage devices are used for "active" data because access to specific data in a magnetic disk storage device is more rapid than access to data in a magnetic tape storage device. Magnetic tape storage devices typically store archived or back up data primarily because the perceived cost of magnetic tape storage is significantly lower than the perceived cost of magnetic disk storage.
Magnetic tape storage devices are devices of choice for generating historical backups. With the perceived costs of different media, tape storage has represented the only practical approach to providing such historical backups. Thus, in the case of a program development, for example, each revision of the program may be transferred to magnetic tape leaving only the most current version of the program on a magnetic disk storage device.
Transfers to magnetic tape storage generally occur in response to the execution of a batch file that identifies one or more files or volumes for backup to a particular magnetic tape storage device as a resource. A host processor runs the batch file to transfer the named file or files from the primary disk storage device to the secondary tape storage device. In a second approach the age of files on a primary storage device is ascertained. "Older" files are transferred to the tape. Unfortunately as a particular batch job must make a transfer to one tape, tape utilization often times is poor. That is, the data stored in a tape may occupy only a few percent of the available storage space in the tape. Moreover associated testing and transfer operations require host processing cycles that can degrade host performance for other applications.
In another approach a second magnetic disk storage device connects to the host. It generally will have about fifteen percent of the total capacity of the primary magnetic disk storage device. Aged data is swept from the primary magnetic disk storage device to the second magnetic disk storage device. This process is more efficient than the above-identified tape transfer process. As space on the second magnetic disk storage device is needed, the oldest data is transferred to the magnetic tape storage device. Although the process can improve performance somewhat, tape utilization still is generally poor. That is, it has been found that about one third of the applications will nearly fully utilize a tape, about one third will provide intermediate utilization and one third will under utilize the tape. For example, it is not unusual to find only a 5 megabyte file on a 1 to 2 gigabyte tape.
When such under utilization occurs, the real cost for tape becomes significantly higher. That is, the total cost of the media associated with the under utilized tape increases the "per-byte" cost of actual storage. If the number of tape drives in a system is not changed, increasing the number of tapes requires tape mounting and demounting that might otherwise be avoided if the tapes were utilized fully. The alternative is to add more tape drives, but that increases the floor space required for the data center.
Tape mount management or similar programs can operate with special hardware configurations that include disk buffers to accumulate data from jobs for subsequent transfer to tapes. Buffer capacity in such systems is limited. While this approach can also improve tape utilization, the system still relies on tapes and the need for tape farms or other physical tape drives.
In still another approach management software collects data to be transferred to magnetic tape. Then the management software transfers all the data from different jobs, commonly "data sets", onto a single magnetic tape. Initially this improves magnetic tape utilization. However, as known, data sets often are stored with a finite life, and a single tape will store data sets with lives ranging from a few days to a few months. As different data sets expire, tape utilization reduces. To maintain high levels of utilization, the tapes are recycled regularly to consolidate data sets on the magnetic tapes. This recycling process is extremely time consuming especially in data processing systems with hundreds or thousands of magnetic tapes.
Consequently the total costs for storing data on magnetic tape storage devices can be significantly higher than the perceived cost. Simultaneously with the increased need for tape storage, the cost of storage on magnetic disk storage devices is falling. Comparable transfer rates, even during data streaming, are achievable in both the magnetic disk storage devices and magnetic tape storage devices. Moreover the ability to dynamically relocate data on a disk device provides an opportunity to utilize space very effectively and achieve high levels of space utilization.
Several proposals have been made to use magnetic disk storage devices as magnetic tape storage devices, that is, to emulate a magnetic tape storage device or operate a magnetic disk storage device as a virtual tape device. However, such proposals require new special-purpose hardware and software modifications such that emulation is not transparent to the user. Consequently the emulation does not act as a true virtual device.
As known, a number of older data processing systems use magnetic tape storage devices as primary storage devices for data generated by an application program. In such applications, tape WRITE requests transfer data directly to a magnetic tape, rather than to a magnetic disk storage device as would occur with more recent applications. Many of these application programs continue to be used today.
Magnetic disk storage devices acting as virtual tapes can greatly enhance the performance of these applications because, as known, transfers to a magnetic disk storage system are often much faster than transfers to a magnetic tape storage system. However, as these applications undergo program continuing development, programming errors can appear. The introduction of endless loops that include a write tape request represent the introduction of one such error. If an endless loop contains a write tape request, writing operations to a conventional magnetic tape storage unit will continue until an end-of-tape return code is received. In some situations this could involve filling multiple tape cartridges with useless data.
Conventional virtual tape devices generally define a volume that corresponds to the total capacity of one or more tape devices. For example, if a particular application were working with a configuration that allocated five 800-MB tapes to an application, the existence of an endless loop would not result in an end-of-tape return code indicating a problem until 4 terabytes of disk memory had been consumed. While such data is easy to delete from a magnetic disk storage device, a primary problem lies in the resources that must be devoted to processing such an endless loop, particularly any common cache that is involved when an endless loop is processed. Such an allocation of resources will be to the detriment of other application programs that are running concurrently in a multi-processor system.