1. Field of the Invention
This invention pertains in general to backing up digital data.
2. Description of the Related Art
In an environment including several client computers, such as a corporate local area network (LAN), a centralized backup system is often employed. The centralized backup system is configured by a system administrator to automatically back up data on the storage devices of the client computers. The centralized backup system may include a backup server that periodically performs backups of each of the client computers by copying data segments (e.g. files or portions of files) from the storage devices of the client computers to a central repository. This central repository has a sufficiently large storage capacity and is often located far from the backup server (and client computers). As a result, data segments to be backed up often need to be transmitted over a significant distance through a possibly limited bandwidth link.
One technique of reducing the amount of data transmitted to and stored in the central repository is referred to as “deduplication.” A backup system employing deduplication takes advantage of the fact that a data segment found on a client computer during a backup is likely already stored in the central repository. It may already be in the central repository because it was backed up from the client computer during a previous backup and the data segment on the client computer has not been modified since then. It may also already be in the central repository because the data segment was previously backed up from another client computer having the same data segment.
When backing up a data segment from a client computer using deduplication, the backup server generates information that identifies the data segment. The backup server transmits this identifying information to the central repository and the central repository sends a response indicating whether it already contains the identified data segment. The backup server then transmits the actual data segment to the central repository only if the response indicated that the data segment is not already contained in the repository. As a result, the number of data segments transmitted to the central repository is reduced.
However, there is still a significant use of computing and network resources involved in the backup since the backup server transmits identifying information to the central repository for each data segment. Also, for each data segment, the central repository uses computing resources to generate a response, and the central repository transmits the response to the backup server. Therefore, there is a need in the art for a way to decrease the computing and network resources required for performing backups using deduplication.