Storage appliances of different kinds are known. One particular type of storage appliance is a virtual tape library (VTL). Virtual tape libraries emulate one or several physical tape drives to write and read tape volumes by an application program running on a host computer. The tape volumes received from the host computer may be stored in different forms, e.g. on an internal file system of the virtual tape library or on physical tape volumes attached to a back-end storage system. A possible system architecture of such a virtual tape library is described, for example, in U.S. Pat. No. 6,963,958 B1. ETERNUS CS High End of Fujitsu Technology Solutions GmbH is such a storage virtualization appliance that emulates a physical tape library. It provides virtual tape drives for I/O and logical volumes on which data is stored.
Virtual tape libraries are often used for regular backup of large data sets such as central data stores of medium to large companies and data centers. Due to the nature of the backup procedure and the respective data sources, typically more or less the same data is stored many times, resulting in redundant storage and thus waste of resources.
To improve resource utilization, more recently, systems and methods of data deduplication, sometimes referred to as “de-dupe” or “dedup”, have been developed. For example, U.S. Pat. No. 8,131,924 B1 discloses a system and method of deduplication of data stored on tape.
With a deduplication engine integrated into a storage appliance, it becomes possible to store virtual volume data in deduplicated form using only a fraction of physical disk space of a virtual storage appliance.
Deduplication engines store data as objects. Once an object was stored, it is possible to retrieve or to delete it. However, it is not possible to modify an existing object.
One possible method of storing a tape volume in a virtual tape library is to buffer the volume data on internal disks and perform offline deduplication by reading the data from disk, applying the deduplication algorithm and storing the reduced data.
Offline deduplication requires additional disk storage resources and causes more I/O cycles to store a piece of data into the deduplication engine. This means higher costs and performance penalties.
Therefore, it could be helpful to provide improved storage systems and methods of the kind described above.