There are widely used storage systems including a disk array having connected thereto a plurality of storage media such as HDDs (Hard Disk Drives) to provide redundancy using the RAID (Redundant Arrays of Inexpensive Disks) technique or the like, and a storage apparatus configured to control the disk array. The storage apparatus allocates a virtual storage area (virtual volume) to a storage area on a disk array (physical volume) and controls data read and write processes via the virtual volume.
The storage apparatus may divide a single virtual volume into a plurality of logical volumes and manage data in units of logical volume. In addition, the storage apparatus may set a logical data storage area (container) which is different from the logical volume and manage data using the container. Note that a container may be referred to as a chunk.
There exists a technique (deduplication) which, when the same data sets are redundantly stored in a plurality of storage areas in a storage system as described above, removes data sets redundantly stored in some of the storage areas to increase free space in the storage areas. Performing deduplication allows efficient use of the storage areas of the disk array.
Note that there is proposed, in a system configured to upload data to an online storage service, a technique that manages items to be uploaded in a plurality of configuration blocks and performs upload after excluding duplication of the configuration blocks. The system realizes duplication exclusion using a reference count indicating the degree of duplication of configuration blocks. In addition, there is proposed a technique that stores hash code-attached data in a logical volume, and detects redundant data by comparing the hash codes when performing deduplication.
See, for example, Japanese Laid-open Patent Publication Nos. 2012-141738 and 2009-251725.
The storage system may include a cache device configured to temporarily store data which has been read from the disk array by the storage apparatus in response to a data read request. The cache device has a higher data read speed than the disk array. Accordingly, the response time may be shortened when the storage apparatus successfully returns the data stored in the cache device in response to the data read request.
A data set to be read is identified by the storage apparatus based on a data storage location (address on a logical volume, or the like) specified in the read request. In other words, the storage apparatus distinguishes a plurality of data sets which is stored in different logical volumes but has the same content. Therefore, in a conventional storage apparatus, even when a data set having the same content as a data set to be read exists in the cache device, if storage locations of the two data sets are different, the data set is read from the disk array and returned.
In contrast, when there exists in the cache device a data set which has a storage location different from that of the data set specified in the read request but has the same content, if it is possible to read the data set from the cache device and return the data set, the opportunity of quick response to the read request may increase. As a result, an improved performance of the storage system may be expected.