A storage device which has a function of eliminating duplicated storage is known as a technique for efficiently handling an enormous amount of data.
In a storage system which performs deduplication as mentioned above, new data is added to the end of a storage area, for example. Therefore, at the time of retrieval of the data later, there may be a need to operate a disk a huge number of times in order to retrieve block data dispersed in the whole storage device.
A technique for dealing with the abovementioned problem is described in, for example, Patent Document 1. Patent document 1 describes a storage device which has a plurality of storage media, a cache memory, and a control part that controls input/output of data into/from the storage media. According to Patent Document 1, the control part provides a host device with first and second storage areas which are configured by the storage areas of the plurality of storage media and have the same performance characteristics. To be specific, the control part stores a first data stream which is a deduplicated data stream into the first storage area, and stores a second data stream generated on the basis of a data stream before the first data stream is deduplicated into sequential areas of a physical area configured by the second storage area. According to Patent Document 1, such a configuration enables storage of the deduplicated first data string into the first storage area and storage of the second data string into sequential areas of the physical area configured by the second storage area. As a result, according to Patent Document 1, it becomes possible to stage the data stored in the sequential areas instead of deduplicated and fragmented data, and it becomes possible to increase access performance.
Further, a technique for dealing with the abovementioned problem is also described in, for example, Patent Document 2. Patent Document 2 describes a storage device which has a data dividing means, a block detecting means, and a data writing means. According to Patent Document 2, the block detecting means detects a common rate which represents the rate of a common portion between a plurality of sequential block data configuring a given range in writing target data among divided block data and a plurality of block data in a given range already sequentially stored in the storage device. Further, the data writing means newly stores divided block data into the storage device in accordance with the common rate detected by the block detecting means. According to Patent Document 2, such a configuration enables control so as to newly write block data into the storage device only when the common rate is, for example, smaller than a given threshold. As a result, according to Patent Document 2, it is possible to inhibit dispersion of block data throughout the whole storage area within the storage device. Consequently, it becomes possible to inhibit decrease of retrieval performance.
Patent Document 1: WO2014-136183
Patent Document 2: JP 2013-541055
However, in the case of the technique described in Patent Document 1, not only the first storage area which stores the deduplicated first data stream but also the second storage area needs to be reserved. Therefore, there is a problem of consumption of the capacity of the storage device. Moreover, in the case of the technique as described above, there is a problem of difficulty in coping with decrease of retrieval performance caused by appearance of the same block twice or more during a series of retrieval processes. In other words, there is a problem that, when block data loaded once into a cache is required again, the data may have already been evicted from the cache and retrieval of the data from a disk may be required again.
Thus, it has been still difficult to inhibit decrease of retrieval performance in a storage device which has a function of eliminating duplicated storage.