NAS (Network Attached Storage) is a storage device suitable for sharing a file data among a plurality of computers via a network. Currently, many file data storages utilize NAS devices.
The amount of data stored in high performance primary file servers are increasing rapidly. Along therewith, the number of disks coupled to a file server and the sizes thereof are increasing, and the costs related to purchasing and maintaining the disks are also increasing. In order to reduce the costs spent on disks, an art related to de-duplication for reducing the amount of data stored in a primary file server is attracting attention. The art of de-duplication can be classified into a block level de-duplication in which de-duplication is performed in block units and a file level de-duplication in which de-duplication is performed in file units, wherein the file level de-duplication technique is specifically referred to as a single instantiation technique.
Single instantiation relates to an art of reducing the physical data capacity by unifying the data of a group of files in which the whole file data are consistent to one file. Single instantiation applies only a small load on the system compared to the block level de-duplication technique since the processing is performed for each file, so that it is easily applied to a primary file server. A general method for realizing single instantiation is disclosed in patent literature 1. Duplication of files capable of being subjected to single instantiation is generally determined by calculating a hash value of files, comparing the hash values, and further subjecting the files having the identical hash values to binary comparison.
Further, since files having greater sizes should be subjected to single instantiation to exert a greater data space reduction effect, patent literature 1 further discloses an art of performing determination for single instantiation by restricting the target files for duplication determination to those having a certain size or greater.