When rewriting data in a flash memory, the characteristic feature of the flash memory does not allow direct overwrite of update data to a physical area where the original data has been stored. In order to rewrite data, the update data must be written to a different area instead of to the area where the original data has been stored. Identical data are written in a plurality of areas, and when the blocks become full, only the newest data stored in the blocks are transferred to unused blocks, and thereafter, a delete process is performed to the blocks not storing the relevant newest data, to thereby create free blocks. In the following description, this process will be called a reclamation process. For this reason, in a storage device equipped with flash memories, a logical address layer having an address that differs from a physical address is provided as an address layer visible to higher-level devices using the storage device, such as host computers, and access requests to logical addresses are received from higher-level devices. When storing data, the physical addresses allocated to the logical addresses are changed as needed. According to this method, the logical address will not change even if the physical address is changed, so that higher-level devices can access the data without recognizing the change of physical address of the write data, so that a high usability can maintained.
Since storage devices having flash memories as storage media are extremely high-speed as compared with HDDs and the like, the use thereof is spreading widely along with the reduction of bit costs recently. In storage systems generally used in companies and the like, high reliability has been realized by having a plurality of HDDs and other storage devices and storing data in a redundant manner to the plurality of storage devices by the controller in the system, and high performance has been realized by having the plurality of storage devices perform processes in parallel. Therefore, even when storage devices having flash memories as storage media are used in a storage system, generally a plurality of storage devices using flash memories as storage media are installed in the storage system, and a storage controller is configured to control the plurality of storage devices having the flash memories as storage media. Some storage devices having flash memories as storage media have a form factor or interface compatible with HDDs, which are called SDDs (Solid State Disks). However, there are some storage devices that do not have compatibility with HDDs. The storage devices having flash memories as storage media described in the present invention include both meanings, and in the following description, both are collectively referred to as flash packages.
However, flash memories have higher bit costs compared to magnetic disks and the like, so there is a strong demand to reduce the capacity of stored data and increase apparent capacity (amount of data that can be stored from higher-level devices such as host computers). In the technical field of storage systems, a deduplication technology exists as a technology for reducing the capacity of stored data. The present technology causes the storage controller to check whether there are data having the same contents among the plurality of data stored in storage devices within the system, and if there are a plurality of data having the same contents, only causes one data to remain and the other same data are not stored (deleted) to thereby reduce the amount of data to be stored in the storage device. If all the data are checked to see whether there are data having the same contents, the amount of calculation becomes excessive, so that a method is often adopted to calculate a representative value of data such as a hash value for each data, by performing calculation using a hash function, and to perform a comparison process of only the data having the same representative values. Further, the method for calculating the representative value is not restricted to the method using a hash function, and can be any method, as long as the values calculated from the same data are always the same by the calculation method. In the following description, the representative value such as the hash value used in the deduplication technology is called a feature value.
Patent Literature 1 discloses an example of a data deduplication technology. Patent Literature 1 discloses a storage system equipped with a plurality of flash memory modules (flash packages), wherein the storage controller or the flash memory module calculates the hash value of write target data, and if the hash value of data already stored in the flash memory module is equal to the hash value of write target data, the flash memory module further compares the data stored in the relevant flash memory module with the write target data on bit-by-bit basis, and when the data correspond, the write target data will not be written to the physical block of the flash memory module, by which the number of rewrites of data of the flash memory can be reduced.
On the other hand, a capacity virtualization technology is spreading widely in a storage system. A capacity virtualization technology is a technology for showing a capacity greater than the physical capacity of the storage device installed in the storage system (virtual capacity) to the host side, which is realized in generally by the storage controller in the storage system. This technology utilizes the characteristic feature that when the user actually uses the storage, the amount of data actually stored in the system does not easily reach the capacity of the volume defined by the user (capacity of the storage device seen from the user). In other words, if there is no capacity virtualization technology, the physical storage area corresponding to the size of the whole storage area of the defined volume had been allocated when defining the volume, whereas if the capacity virtualization technology is applied, the physical storage area is allocated only when the data is actually stored in the storage system. According to this arrangement, the necessary amount of physical storage area can be reduced, and the user is not required to strictly define the volume capacity, so that the user should simply define a value with a great margin, according to which the usability of the system can be enhanced. Patent Literature 2 discloses a storage system having a storage controller coupled to a plurality of flash packages, wherein not only the storage controller but also the flash packages are equipped with a capacity virtualization technology. According further to Patent Literature 2, a technique is disclosed where the flash packages have a function to compress and store the data, and to change the virtual capacity that the flash package shows to the storage controller (lower-level virtual capacity) in response to the change of compression rate. Therefore, the flash package has shown to the storage controller a capacity greater than the actual physical capacity of the flash memories. In Patent Literature 2, the capacity virtualization technology executed by the storage controller is called a higher-level capacity virtualization function, and the capacity virtualization technology executed within the flash packages is called a lower-level capacity virtualization function, to thereby distinguish the two virtualization technologies.
When compression is performed, the data length after compression differs in each data update, so that there are many cases where data cannot be stored in the area where the data had been originally stored. Due to the property of the flash memory, the updated data must be stored in a new area, so that realizing the compression function in a flash package is considered to make good use of the property of the flash memory.