The present invention relates to storage and transfer of data in a virtual tape library environment. More specifically, the invention is a method and system for compressing data in a virtual tape library to conserve disk space and for computing an estimated compression ratio to provide a one-to-one correspondence in size between virtual and physical tapes.
Virtual tape libraries emulate physical tape libraries to more efficiently handle backup data. For those occasions when users need to generate physical tapes for off-site storage or data interchange, for example, the desired data must be written from the virtual tapes (which at least initially contain the data) to physical tapes. Unlike in mainframe virtual tape systems that stack multiple virtual tapes onto a single physical tape, a one-to-one correspondence in size between virtual and physical tapes is desirable in the open systems world.
The majority of modern physical tape drives perform data compression before data is stored on the physical tape media. The data compression is dependent not only on the data itself, but also a physical tape's storage capacity and a tape drive's compression algorithm. These dependencies make it impossible to statically select an appropriate data capacity for the virtual tape a priori. As an illustration, if a virtual tape is fixed to 20 GB because the physical tape is 20 GB, a large portion of the physical tape may not be used once the data is transferred from the virtual tape to the physical tape. This is because the tape drive might compress the data down to 10 GB, for example. If, on the other hand, a 30 GB virtual tape is created for a 20 GB physical tape in an attempt to account for data compression and use the physical media more efficiently, it is possible that uncompressible (random) data is written to that virtual tape. In this case, when the virtual tape is exported onto the physical tape, only the first 20 GB will fit on the physical media. While the first option is preferable, clearly neither choice is satisfactory. Therefore, it would be desirable to dynamically ensure that the amount of data written to each virtual tape is large enough not to waste physical resources while being small enough to not exceed the capacity of the physical tape.
Additionally, while physical tape drives typically compress data before writing it to tape, existing virtual tape libraries do not include this feature. This is mostly due to the great amount of processing power that is required to compress high-bandwidth data streams in real-time. It would, therefore, also be desirable to store data compactly on random access media, either in real-time (as data is written to the virtual device) or at a later time when more processing power is available. In either case, this is preferable because it keeps the footprint of virtual tapes low and thus saves comparatively expensive random access storage space.
A need therefore exists for a method and system for compressing data written to a virtual tape library (in real-time or otherwise) for efficient storage thereof and for computing an estimated compression ratio in real time to dynamically provide a one-to-one correspondence in size between virtual tapes and physical tapes.