Large-scale mainframe computers continue to be used extensively across many industries. Historically, tape storage has been used to provide permanent and temporary data protection services to those mainframes. In such environments, it is not uncommon for mainframe tape libraries to hold hundreds of Terabytes (TB) of data spread across tens of thousands of tape volumes.
Virtual tape emulation (VTE) products such as DLm available from EMC Corporation of Hopkinton, Mass. Can be used to emulate a given number of tape volumes to the mainframe using disk drives as the storage media instead of magnetic tape. As a mainframe-based application writes data to what it believes is a tape drive, that data is actually stored as a tape volume image on direct access storage device such as a disk array subsystem. Each individual tape volume written by the mainframe becomes a single disk on the filesystem on the disk array. Such VTE products ultimately allow the operators of mainframe data centers to move from a tape-based backup solution to a disk-based backup solution, thus leveraging present low-cost disk technology to provide cost efficient data storage solutions.
In a VTE system, the mainframe host writes data to the virtual tape drive using the same commands as it would as if it were writing to an actual magnetic tape drive. The normal flow of data written from a mainframe host to a virtual tape drive (such as the EMC DLm) is a sequential process in which the mainframe writes a data block, the data block is received by the virtual tape drive, the data block is compressed, the data block is written to the virtual tape file on the virtual tape server. After the write operation, an acknowledgement is sent to the host, at which time the process repeats for the next data block. In general, data compression is the most time-consuming step of this process, and is the most significant factor affecting performance. Thus, the sequential process of present VTE data write processes limits the overall system performance to the slowest step in the process, which is the data compression step. Other block-level data handling operations can also benefit from parallelization, including encryption, deduplication, compression checks, data rearrangement, and the like.