In recent years, there has been proposed a technique that reduces the amount of stored data and reduces the number of writing times to a NAND (Not AND)-type flash memory incorporated in a storage device, such as a solid state drive (SSD), with de-duplication or compression techniques, to prolong the lifetime of such a storage device. In writing data into a NAND-type flash memory, such a de-duplication technique determines, for each data unit referred to as a chunk obtained by dividing data to be written into a predetermined size, a duplicate candidate between chunks with the use of a hash value calculated with a particular algorithm using, for example, a hash function. Also, there has been proposed a duplication removing method that actually checks whether there is a duplicate of the duplicate candidate in a back-end process, and removes one when such a duplicate is found. Algorithms that determine a hash value from one-chunk data includes various algorithms, such as Message Digest 5 (MD5), Secure Hash Algorithm 1 (SHA-1), or SHA-2, for example.
However, such a duplication search between chunks using a hash value usually uses a hash list that stores past hash values having limited sizes, but there is a problem in that, when the size of the hash list is insufficient, such a search is not carried out in a wide range and duplicate data cannot be effectively detected. For example, it is assumed that the data length of a chunk is 4 [KB (kilo bytes)] and a hash list stores a hash value of 20 [B] and an address of 8 [B] indicating a storage destination to a HAND-type flash memory. Assuming that the number of hash-value entries into the hash list is 2M-entry, the size of the hash list is 28[B]×2M=56 [MB]. However, the search range of a NAND-type flash memory that this hash list can cover is only 2M×4 [KB]=8 [GB] in principle. Thus, in this hash list, the duplication detection of data written in the past back to 8 [GB] or more is basically very difficult.
Such a hash list is frequently accessed and thus it is impractical to be stored in a NAND-type flash memory. Although an SSD often includes a dynamic random access memory (DRAM) as well as a NAND-type flash memory, such a DRAM is utilized as a work memory used for control between a host and the NAND-type flash memory that the SSD should essentially perform. Thus, it is difficult to allocate a hash list having a large capacity thereto for the duplication detection.