Traditional backup software uses a driver that tracks changes made to a persistent storage device, also called a hard disk herein. The changes are used to backup only the parts of the disk that have changed since the last backup. However, such drivers require specialized code for each operating system. Also, implementation of the drivers is complex to ensure that not a single change is missed—this is particularly hard during a boot process.
Additionally, present backup methods do not handle complex situations in an efficient manner. For example, some existing backup routines use an archive bit where one bit is designated to a file, and the bit is turned on when data in that file is changed. A backup just retrieves and replicates files that have the corresponding bit turned on. When the backup is completed, all the archive bits are cleared. A drawback is that a break down would occur (due to resetting of the bits) when an additional backup application uses this interface. Even worse, the problem would not be detected by the additional backup application. Also, the archive bit corresponds to an entire file, and thus if one part of a file is changed, then all of it is backed up.
Other existing backup methods use redo logs. Once a redo log is created, all changes to a disk are captured in the redo log. When a backup is to be performed, data stored in the redo log is used for the backup. A new redo log is then created and the prior one is committed into the base disk. However, this method is costly in terms of additional operations and additional disk space required, particularly if there is more than one application performing a backup. This costly overhead stems, for example, from the fact that redo logs also preserve the prior state of the disk.
Using timestamps also requires relatively heavy storage and/or processing. Also, if the backup is taken from an alternate location, such as a dedicated backup server, issues could arise if the clocks between a virtual machine whose data is being backed up and a backup server are not tightly synchronized: If the clock on the backup server is ahead of the clock in the virtual machine, backups might be incomplete.
Another backup method uses checksums. While this method can deliver incremental image level backups, its scalability is limited. For example, every time a backup is performed, the entire disk to be backed up has to be read by the backup application. Hence, the load on the data source is not reduced compared to performing a full backup every time. Also, reliable checksums (e.g. cryptographic hashes) can be computationally expensive to compute.