In recent years, a storage system is widely used which achieves an improvement in the reliability of data and an increase in capacity by causing a plurality of HDDs (Hard Disk Drives) and/or SSDs (Solid State Drives) to have a redundant configuration using a RAID (Redundant Arrays of Inexpensive Disks) technique or the like. Moreover, in order to improve the reliability of data, an error detection code (e.g., CRC (Cyclic Redundancy Check) code) is also added to the data to be written into a memory device, such as an HDD and/or an SSD. The reliability of a storage system is maintained by such techniques.
While the capacity of a memory device is also increasing with a development in technology, the amount of data used by a user is also rapidly increasing. Therefore, techniques are under study for reducing the amount of data to be written into a memory device. A technique called de-duplication has been developed as one of the techniques. This technique is for identifying, among the data to be written into a memory device or the data already written in a memory device, a plurality of data portions (chunks) each having a duplicated content and then leaving one identified chunk and removing the other chunks each having the content that overlaps with that of the identified chunk.
In removing the other chunks, the storage system generates information (reparse point) indicative of a relationship between the remaining chunk and the other chunks. Then, upon receipt of a read request for the other chunks, the storage system identifies the remaining chunk based on the reparse point and responses using the identified chunk. Application of this de-duplication enables the capacity of a memory device to be efficiently utilized. Moreover, in a memory device, like an SSD, having a limited number of times of rewriting, the de-duplication contributes to a reduction of the number of times of rewriting.
As described above, in order to improve the reliability of data, a check code including a CRC code and the information indicative of the write destination of data may be added to the data to be written into a memory device. As the techniques for performing the de-duplication on such check-code attached data, a technique has been proposed for separating the check code from the data, performing de-duplication, and then concatenating the check code, which is separated prior to the de-duplication, to the de-duplicated data and writing the resulting data into a memory device.
Note that, with regard to the CRC code, a technique has been proposed for generating a CRC code from the data prior to compression, and generating a dummy code so that this CRC code matches a CRC code that is generated from the compressed data with a dummy code added thereto. In this technique, together with the compressed data, a dummy code and a CRC code generated from the data prior to compression are written into a memory device.
See, for example, Japanese National Publication of International Patent Application No. 2013-532853 and Japanese Laid-open Patent Publication No. 08-116274.
A CRC code generated from the same data has the same value and the check code includes the information about the write destination of the data. The write destination often differs even if the content of the data is the same, and therefore if attempting to de-duplicate the data including the check code, the amount of data to be able to be removed will decrease. Therefore, a method is effective, in terms of increasing the utilization efficiency of a memory area, for performing de-duplication after separating the check code.
However, the data remaining after de-duplication may be compressed and then written into a memory device. In this case, even if the check code separated in de-duplication is concatenated to compressed data and written into a memory device, the check code may not be used in determining an error that occurs in the compressed data. Accordingly, the reliability will decrease. On the other hand, the above-described technique involved in the method for generating a dummy code does not take the de-duplication into consideration at all.
Therefore, not limited to the compression, in cases where a storage system makes any change to the de-duplicated data, it is effective, in terms of maintaining the reliability of the storage system, to provide a mechanism for assuring the reliability of the changed data.