1. Field of the Invention
The present invention relates generally to computers, and more particularly to mechanisms for reducing and/or eliminating duplicate written records in a computing storage environment.
2. Description of the Related Art
So-called “deduplication” (also referred to herein as “dedup”) is a technique of reducing a used amount in storage. In this technique, when there are multiple files having the same content, a storage does not store the data of the same contents in as many locations as the files, unlike a conventional technique, but stores the data in only one location. Whichever file of the multiple files is referred to in the storage, the same entity is referred to. Only when the data of any of the files is changed, the content thereof is newly stored in a different location. This technique is one for reducing a used amount in the storage capacity as a whole. A used amount that can be reduced in the technique depends on the content of the stored data, but can be reduced to one tenth to one twentieth of that in the conventional technique in the case of particular usage.
In writing data to a tape, a host firstly writes data to a tape drive in variable-length units which are termed as records. The tape media has a capacity of about 1 TB. When dedup functions in the tape drive, there is a problem that it takes time to check if data to be newly written is the same as data already written.
A tape drive (for example, the IBM® TS 1120, which is a tape drive for enterprise use) is a sequential device configured to execute write and read operations sequentially in physical locations in a tape medium (a storage device configured to store records sequentially in physical locations in a recording medium). Generally, the dedup technique requires generation, comparison, and the like of hash values showing whether or not write data is duplicate data of recorded data. The sequential device receives data to be newly written from an upper-layer device, and the data is expected to be identical to data already written to a tape medium. In this case, the sequential device requires an average of approximately two minutes to move the tape to a certain position for reading the already written data and then return the tape to the previous position for writing data. For this reason, there is no tape drive product so far which alone supports the dedup function.
Conventional techniques include a method of copying only a difference between two storages for synchronization of the two storages in order to reduce power consumption. In this method, the dedup is not executed by a tape drive alone, and a method of backing up already deduplicated data to a tape or the like. This technique is not one in which the dedup is executed by a tape drive alone.