1. Field of the Invention
The present invention relates to a transfer data management technique for internet backup. More particularly, the present invention relates to system control that enables a reduction in transfer data volume by transferring only the minimum required data without any duplication in transferring new backup data as backup on a backup server that collects backup of data of a plurality of users and stores it after de-duplication.
2. Background Art
In recent years, along with improvements in performance and reductions in the prices of computer systems, the use of computer systems has become widespread in and for a variety of industries and purposes. Along therewith, it is becoming common to computerize and electronically store in computer systems data that was conventionally handled through paper media and the like. Further, use in a configuration in which a plurality of computer systems is connected by a network is advancing rapidly. It has thus become possible to realize remote backup, distributed management, and distributed processing of data, and it is becoming possible to realize improved availability, reliability, and performance which had been difficult to realize when merely stored in a single computer system.
In addition, in recent years, as communication networks have become wider in bandwidth and network connection fees have shifted towards flat rates and lower prices, services using the Internet have also become widely used. At first, such services as browsing web pages and sending/receiving e-mail were mainstream. However, recently, services in which large volumes of data are exchanged such as data backup services via the Internet (hereinafter referred to as “internet backup”) have also become popular. When backing up data, users conventionally had to individually furnish a backup device, manage the backup device themselves, and thus back their data up. However, when using internet backup of such a form, users are able to back their own data up by simply accessing a backup service that is accessible via the Internet. Further, there is an advantage in that each user is only required to furnish connection to the Internet, and thus furnishing and managing a backup device is unnecessary. It is thus expected that use of such internet backup will become even more widespread in the future.
Conventionally, in collecting backup data from a plurality of users at a backup server, there are cases where identical data are transmitted to and stored on this backup server in duplicate. For example, with respect to popular music files purchased at internet shops and the like, the same music file is often owned by a large number of users. It is therefore likely that the same data might exist within the backup data in duplicate.
Patent Document 1 mentioned below discloses a technique for data de-duplication of the same content with a view to improving the use efficiency of storage on a backup server. In this scheme, digest (summary) information of stored data is calculated in advance at the server storing the data. Then, in transmitting and storing new data to and on the server, digest information of the data to be transmitted is sent to the server before the data to be transmitted is transmitted, thus making it possible to determine at the server whether or not it is necessary to transfer the data, in other words, whether or not the same data is already stored on the server. If duplication of the data to be transmitted is detected on the server side, it becomes no longer necessary to send this data to the server side. As a result, there are such merits as improved storage use efficiency resulting from de-duplication at the server, as well as improved network use efficiency due to the fact that transmission of duplicate data via a network is made unnecessary.
[Patent Document 1] U.S. Pat. No. 7,272,602