In recent years, an amount of data stored in a file server is increasing rapidly. In order to reduce a data storing cost of the file server, a file-level deduplication function capable of reducing an amount of file data stored in the file server is attracting attention.
The file-level deduplication function is realized by extracting a group of duplicate files having duplicate data bodies from among file groups stored in a file system of the file server and deleting data bodies except for one data body to replace the data bodies with reference data. By sharing the one data body by a plurality of files, redundant data bodies stored in the file system may be deleted. Therefore, the amount of data stored in the file system may be reduced.
Meanwhile, as a countermeasure against a failure of the file server or a disaster, the data managed by the file system, to which the file-level deduplication has been applied, is backed up regularly to a tape device as before. For the backup, for example, the Network Data Management Protocol may be used.
For example, U.S. Pat. No. 8,204,862 B discloses a method for restoring deduplicated data. The method may include receiving a request to restore a set of deduplicated data segments to a client system, where each data segment in the set of deduplicated data segments is referred to by one or more deduplication references. The method may also include procuring reference data that indicates, for each data segment in the set of deduplicated data segments, the number of deduplication references that point to the data segment. The method may further include using the reference data to select one or more data segments from the set of deduplicated data segments for client-side caching, caching the one or more data segments in a cache on the client system, and restoring the one or more data segments from the cache on the client system. Various other methods, systems, and computer-readable media are also disclosed.
In addition, U.S. Pat. No. 8,200,926 B discloses a computer-implemented method for creating a full backup. The computer-implemented method may include creating a first full backup of a set of data units at a first time. The computer-implemented method may also include identifying one or more data units in the set of data units that have been modified since the first time. The computer-implemented method may further include creating a second full backup of the set of data units by providing copies of the one or more data units that have been modified since the first time and storing references to copies of one or more data units in the set of data units that have not been modified since the first time. The references may be configured such that the second full backup is a standalone backup that is independent of any other backups.