1. Field of the Invention
The present invention relates to a method, system, device, and program for transferring duplicate files in a hierarchical storage management system.
2. Description of the Related Art
Hierarchical Storage Management (HSM) is a technology used in a plurality of file storage devices such as storages and servers with different performances and functions, to migrate files between the file storage devices according to the use state of each file. As shown in US Patent No. 2004/0193760, in a computer system implementing HSM (hereinafter referred to as an HSM system), frequently accessed files are stored in a file storage device with high performance/bit cost (hereinafter referred to as an upper Tier) belonging to a higher hierarchy according to the access frequency, while less frequently accessed files are stored in a file storage device with low performance/bit cost (hereinafter referred to as a lower Tier) belonging to a lower hierarchy. This makes it possible to provide a computer system to a client computer, as if it has a large-scale and high-speed storage device at a lower cost. Incidentally, the data stored in the lower Tier are files that are typically updated less frequently such as, for example, a backup file that the user keeps without deleting the file as a backup in case anything goes worth, a file that the user has finished frequently referring to, and a file that the user has to hold without deleting it for a legally prescribed period.
There is known a de-duplication technology for reducing the data storage capacity by eliminating duplicate data. The de-duplication technology in a file storage device for specifying a target file by path name and file name in a network file system (NFS), and the like, includes the following three steps:
(A) Find a group of files with the same data content, from a plurality of files stored in the file storage device;
(B) Keep at least one real data of the group of files with the same data content, and delete the rest of the data; and
(C) In response to a read request that specifies a file included in the group of files, identify the at least one of the data that is kept and corresponding to the specified file, and transmit the identified data.
US Patent No. 2008/0243769A1 discloses a method for transferring data from a storage having a de-duplication function for backup data, to a backup storage having no de-duplication function, by way of returning de-duplicated data to non-de-duplicated data.
US Patent No. 2008/0244204A1 discloses a method for duplicating a storage area between backup servers in a network including a plurality of backup servers each having a de-duplication function. This technology reduces the traffic between the backup servers, in such a way that one backup server transfers duplicate identification information of the data in the storage area to be duplicated, to another backup server, and that the other backup server detects duplication based on the duplicate identification information, and transfers only the non duplicate data.