The present invention relates generally to storage systems and, more particularly, to improvement of deduplication efficiency for hierarchical storage systems.
Recently, deduplication technology has become very popular. The technology involves two control methods. The first method is to find same data. The second method is to share the found same data between two objects (“object” includes “file”, “region” and so on). Each method is described, for example, in U.S. Pat. No. 6,928,526 on control method of deduplication using hash table and page management. The first method uses a hash table. The hash value in the hash table is generated from stored data. The second method uses page (segment) management. If two objects have the same data, the pages in the two objects refer to the same stored data.
In general, the number of pages increases with increased storage capacity. The table capacity for the pages also increases with an increase in the number of pages. When the storage has many pages and a large table capacity, the storage performance will drop because of complex table structure or the storage price will run up because of rich hardware configuration. For example, if the hash data is 128 bits per 4 KB stored data, 1PB storage requires 4 TB hash table.
To resolve that issue, the configuration described in US 2008/0244196 on storage functions with standardized storage interface is useful. A storage device can mount one or more VTLs (Virtual Tape Libraries) using a technology described in U.S. Pat. No. 7,711,896 on control method of external volume function, if the VTL supports block storage interface. Strictly speaking, the use of the term “VTL” is not correct in this sentence because “VTL” means that the storage has tape interface and deduplication function. By this configuration, the storage can have deduplication function without deduplication management table (including hash table) in itself. The storage device can offload controls and management of deduplication function to external VTLs. However, in this configuration, it is difficult to deduplicate if the same data is stored into two VTLs, because the storage device and each VTL do not know that the same data is stored in the other VTL.