In recent years, with an increase of data to be processed, the cost for a storage for storing the data and its backup data also increases. There is a case where identical data blocks are repeatedly stored in the storage.
Therefore, it has been proposed to suppress the identical data blocks from being stored in the storage by a deduplication technique to suppress a cost for the storage.
In the deduplication technique described above, when wiring a data block to the storage, it is detected whether or not there is an identical data block to the data block to be written, using a hash value, for example. In a case where there is no identical data block, the data block to be written is written in the storage. In a case where there is an identical data block, duplication of the identical data block is avoided by not writing the data block to be written to the storage.
A reference frequency of information such as data blocks or the like generally tends to decrease when a predetermined period of time has elapsed after the information is generated. With this tendency, in recent years, along with an increase in opportunities to utilize big data, a case occurs in which data is stored in a high-performance storage without being referenced for a long period of time and a decrease in performance of the storage is caused.
Therefore, it has been proposed to improve the performance of the storage by a technique (hierarchization technique) for hierarchizing data arrangement by using a hierarchical storage including a plurality of storage devices with different performances. As the plurality of storage devices with different performances, for example, a storage class memory (SCM), a solid state drive (SSD), and a hard disk drive (HDD) are used.
In the hierarchization technique described above, data access to the storage is monitored for each of addresses (that is, data blocks stored at the address) and an access frequency to each address is detected. Then, data blocks are rearranged among the various storage devices on the basis of the detected access frequency and a predetermined policy. For example, data blocks of which access frequency is high are arranged in a storage device with high processing speed, and data blocks of which access frequency is low are arranged in an inexpensive storage device with slow processing speed.
Related techniques are disclosed in, for example, Japanese Laid-open Patent Publication No. 2014-41452, Japanese Laid-open Patent Publication No. 2009-205201, and Japanese Laid-open Patent Publication No. 2009-129076.
In a case where both of the deduplication technique and the hierarchization technique are applied to the storage, for example, the hierarchization technique is applied after the deduplication technique is applied.
In a case where writing of a data block is performed at a specific address many times, since an appearance frequency (access frequency) of each of the data blocks is low, each of the data blocks is arranged by the hierarchization technique on a storage device with slow processing speed. In addition, although the data block is to be overwritten also on the actual storage device intrinsically, since contents of the data blocks are different from each other, a new address is assigned to each of the data blocks by the deduplication technique, and each of the data blocks is written into the storage device. For this reason, a storage area in the storage is wastefully used, the processing amount of garbage collection increases, and performance of the storage is reduced.
The garbage collection is a function of releasing an area storing an unnecessary data block, for example, by discarding each of the data blocks which are wastefully written at the new address as described above, as an unnecessary data block.