Large-scale mainframe computers continue to be used extensively across many industries. Historically, tape storage has been used to provide permanent and temporary data protection services to those mainframes. In such environments, it is not uncommon for mainframe tape libraries to hold hundreds of TeraBytes (TB) of data spread across tens of thousands of tape volumes.
Virtual tape emulation (VTE) products such as DLm available from EMC Corporation of Hopkinton, Mass. can be used to emulate a given number of tape volumes to the mainframe using disk drives as the storage media instead of magnetic tape. As a mainframe-based application writes data to what it believes is a tape drive, that data is actually stored as a tape volume image on direct access storage device such as a disk array subsystem system. Each individual tape volume written by the mainframe becomes a single disk file on the filesystem on the disk array.
Such VTE products ultimately allow the operators of mainframe data centers to move from a tape-based backup solution to a disk-based backup solution, leveraging today's high speed low cost disk technology to provide an innovative approach to data storage.
The mainframe host writes data to the virtual tape drive using the same commands as it would as if it were writing to an actual magnetic tape drive. One of these commands is a synchronization or sync point. These “sync” points are a point in time at which the host mainframe can make a valid assumption that any data previously written to the tape has been safely stored on the tape media. When the sync condition occurs while writing to a real tape, any buffered data not yet written to the media is immediately flushed out to the tape media, from the perspective of the tape drive. The tape drive can therefore immediately detect any write errors that occur during this buffer flushing operation. Upon completing of flushing to the media, a success/fail indication is returned to the host operation (such as an application program) that triggered the sync condition.
In other words, as the host executes a series of write operations to a tape drive, it does expect some buffering to occur such that there will be periods of time when not all of the data is quite yet recorded on the tape. If an error occurs during this time, the mainframe thus should accommodate the fact that some data may be lost. The understanding between the mainframe and the tape drive is that, until the sync point is reached, errors may occur that result in data loss, and the mainframe should take steps to accommodate such possible data loss. However once the host issues a sync command, the host typically expects to then wait until the tape drive reports that all data has been safely written, before taking another action.
Virtual tape systems introduce one or more layers of additional buffering not only in the virtual tape server itself, but also in the backend storage filesystems. As a result, there is no guarantee of immediate feedback in response to a write error. In a worst case scenario, error status may not be returned to the virtual tape emulator from the backend storage system until after the entire filesystem has been written and closed. This can be long after the host mainframe can tolerate a waiting period after the sync points, and long after the host may have made decisions based on the false assumption that the data was safely written.
In addition, any errors in transmission that were not detected by the backend storage system, or undetected errors introduced by the backend storage system itself, may not be reported to the virtual tape server at all. In this case the mainframe will not even see the errors. Such data integrity errors will only be detected at a much later time if and when the mainframe finally attempts to read the data back.
The problem is further exacerbated by the fact that the backend storage array itself has latency. Although disk drives are quite fast as compared to tape drives, the storage array may implement services such as RAID, de-duplication, replication, garbage collection, or other filesystem management services that control exactly where each piece of a filesystem resides and on which physical disk, and at what time. These backend management services further introduce buffering and/or latency in a way such that the virtual tape emulator has no control over it, nor even any visibility into it.
In prior art approaches, it was assumed that backend storage arrays would successfully complete the data writes. Thus no verification of virtual tape integrity was performed until it was incidentally tested if and/or when the mainframe eventually reads the virtual tape contents back. This approach did not give immediate feedback to the mainframe program during the virtual tape sync process, and thus the mainframe could not perform timely error reporting and/or execution recovery operations, such as informing the application so that the application could rewrite the data. The host application will have completed with a premature assumption that the data is safely written on the virtual tape, and thus taking subsequent actions that will result in irretrievable data loss, such as permitting the application to delete the original data from a source elsewhere available to the mainframe.