The cost reduction demand for a storage system is high. Expectations are high for arithmetic compression and de-duplication that are capable of reducing the amount of data to be stored in the system.
The size of data to which the arithmetic compression has been applied is different from the size of the original data. As a result, the size of the logical address range of the original data is different from the size of the physical address range of the compressed data. When the de-duplication is applied, at least one piece among multiple pieces of redundant data is removed, the physical address of the remaining piece of data is associated with the logical address of the removed piece of redundant data. According to these facts, a storage system that adopts at least one of arithmetic compression and de-duplication adopts a log-structured scheme, which is a scheme of additionally writing into a physical address different from a logical address (the log-structured scheme may be adopted in a storage system that adopts none of the arithmetic compression and de-duplication).
The log-structured scheme invalidates the storage area for old data in update writing that updates at least a part of data having already been stored, de-duplication that eliminates at least one piece of data among pieces of data that are redundant with each other, and arithmetic compression that compresses data having already been stored (post-process arithmetic compression); the old data is, for example, any of the following pieces of data, i.e., a piece of data having not been updated yet by the update writing, a piece of data regarded as a de-duplication target by the de-duplication, and a piece of data having not been arithmetically compressed yet. The invalidated area becomes a free area. Consequently, such invalidation becomes a cause of fragmentation of the free area. The storage system that adopts the log-structured scheme requires garbage collection (GC) that collects fragmented free area (fragmented invalid area).
PTL 1 relates to a GC control method. The technology of PTL 1 identifies valid data (live data that is not invalid) using a physical-to-logical table for deriving a logical address from a physical address, identifies the logical address of the valid data using a logical-to-physical table for deriving a physical address from a logical address, copies the valid data into another area, and updates the physical-to-logical table and the logical-to-physical table so as to associate the copy-destination physical address with the logical address of the valid data.
The technology of NPL 1 can divide a physical address space into segments each having a certain size, and select a segment having a high GC efficiency as the copy source. More specifically, the technology of NPL 1 formalizes the GC efficiency on a segment-by-segment basis by Expression 1, and selects a segment having a high GC efficiency as the copy source. Expression 1 is an equation that means the concept of GC efficiency. More specifically, the live data space that is the amount of copy-target data in a segment is regarded as cost “u”, a segment free space (1−u) (free space) obtained by multiplication of a coefficient “a” (age) that represents the oldness of the segment is regarded as a segment benefit (benefit), and the benefit to the cost is regarded as the GC efficiency (benefit/cost). Here, the oldness of a segment is the average of timestamps of pieces of data in the segment.benefit/cost=free space*age/live data space=(1−u)*a/u  (Expression 1)