Despite the overall improvement in the reliability of data storage devices (e.g., disk drives), it remains necessary to implement backup systems to protect against data loss. In a typical backup system, a backup agent executing on a primary or source computer system identifies and reads data to be backed up, and then automatically communicates (e.g., transmits over a network) a copy of the data to a secondary, or backup, computer system where the backup data are stored. Accordingly, if data loss occurs as a result of a failed data storage device at the primary computer system, the data can be restored by copying data from the secondary system back to a new data storage device at the primary system.
To minimize the amount of storage space required at the secondary computer system for storing backup data, many data backup systems utilize some form of incremental backup scheme. An incremental backup scheme is one where the backup agent executing at the primary computer system first generates an initial backup, often referred to as a baseline backup. A baseline backup includes all of the directories and/or files of a file system that a user has selected to be backed up. Accordingly, the backup agent on the primary computer system locates all of the directories and/or files designated as requiring backup, and then transfers a copy of those directories and/or files to a secondary computer system to be stored. Then, subsequent to the baseline backup being generated, the backup agent will periodically perform incremental backups. During each incremental backup, the backup agent locates and includes in a backup only those files (or blocks) that have changed since a previous backup was performed. Incremental backups may be performed at a file level, where an entire file is included in an incremental backup, or at the block level, where only the particular changed blocks of a file that has changed are included in the incremental backup.
It is often the case that the primary computer system, which is generating and/or storing the application data to be backed up, is located at a different location than the secondary computer system, where the backed-up data are stored. Accordingly, if a catastrophic event occurs at the location of the primary computer system, the secondary computer system—at a different location—would not be affected. Typically, the two computer systems will be communicatively coupled to one another by means of a network, such that the backup agent on the primary computer system sends the backup data over the network to the secondary computer system.
Because the initial baseline backup includes all directories and/or files from one or more volumes selected for backup, the amount of data to be transferred in a baseline backup can be very large, and almost certainly larger than any individual incremental backup. When the network connection between the primary and secondary computer systems is unable to quickly communicate large amounts of data (due to either bandwidth or throughput constraints), the transfer of the baseline backup data from the primary computer system to the secondary computer system may be delayed, or alternatively, it may interfere with the normal operation of the network. Consequently, in certain situations it may be more efficient to utilize an alternative transport mechanism for transporting the baseline backup data to the location of the secondary computer system (e.g., a data center). For example, the baseline backup may be written to a portable storage device, which is manually transported to the secondary computer system. However, a problem with this approach is that confidential and sensitive data may be compromised if the portable storage device should fall into the wrong hands.