This invention relates to a storage system.
In recent years, there has been a steep increase in data amount accumulated in companies, and thus there has been a strong need for a storage apparatus capable of storing a large amount of data at a low cost. Accordingly, a data amount reduction technology for reducing an amount of data stored in the storage apparatus to reduce the cost of the storage apparatus is attracting attention. As such a data amount reduction technology, there is known a deduplication technology involving finding a data string identical to another data string from among data strings stored in the storage apparatus and eliminating a redundant data string, to thereby reduce an amount of data stored in the storage apparatus.
In US 2013/0226881 A1, the following deduplication technology is disclosed. Specifically, in a storage apparatus capable of accessing data on a file-by-file basis, when a host coupled to the storage apparatus stores a file, it is detected whether or not a data string of the file to be stored is identical to another data string stored in the storage apparatus. A data string that is different from any other data string is stored in the storage apparatus. Meanwhile, a duplicated data string is not stored in the storage apparatus, but is managed as mapping information for mapping a storage address of another duplicated data string stored in the storage apparatus. In this manner, the amount of data stored in the storage apparatus is reduced.
In the deduplication technology described above, a logical address for which duplication is detected is managed in association with a storage address of a shared data string that is referred to from other logical addresses within the storage apparatus. Thus, data strings stored in the storage apparatus are stored in a plurality of addresses in the order irrelevant to the order in which the host computer stores the data strings in the storage apparatus. As a result, fragmentation occurs. Therefore, when the host computer coupled to the storage apparatus reads data stored in the storage apparatus after the deduplication, in the storage apparatus, data is randomly read from a plurality of addresses, to thereby restore the original data string. For example, I/O performance of a hard disk drive (HDD), which is a storage medium for storing data, is lower in random read access than in sequential read access due to a constraint caused by its operation principle. Thus, I/O performance of the storage apparatus deteriorates.
In order to solve the problem described above, in US 2013/0226881 A1, the following technology is disclosed. Specifically, an access frequency that is based on a file accessed by the host and a level of fragmentation within the storage apparatus are monitored in advance before deduplication is executed, and when the level of fragmentation is high, the deduplication processing is not executed, to thereby prevent deterioration of I/O performance of the storage apparatus. Further, hitherto, the problem of deterioration of I/O performance can be alleviated by using, as a storage medium of the storage apparatus, a solid state drive (SSD) using a semiconductor memory as a storage medium, a storage medium having high random access performance or a large-capacity cache memory.