1. Field of the Invention
Embodiments of the present invention generally relate to data protection systems and, more particularly, to a method and apparatus for optimizing a backup chain using synthetic backups.
2. Description of the Related Art
Recovery from a data loss event normally requires an effective backup process. The backup process is used to create copies or backups of a data storage device, which can be used to restore original file data following the data loss event. The backup process protects against software failures, hardware failures and any other error committed by the user. Software failures are bugs or procedural errors in, for example, a server application that corrupts the contents of file data. Hardware failures can range from the failure of a single hard disk to the destruction of an entire data center, making some or all files of the file data unavailable. Errors committed by the user include errors such as accidental deletion or overwriting of files that are later required. Consequently, in these cases, deleted file data subsequently delay the ability of a user or set of users to use the file data accurately. Conventionally, backup systems use two types of backups: full and incremental.
During a full backup, the system transfers a copy of all the data on one or more storage devices to a backup set comprising one or more backups. During subsequent incremental backups, the backup system copies the data that has changed since either the last full backup or the most recent incremental backup and stores the copies as one or more incremental backups. Differential backups are incremental backups that capture the data that has changed since the last full backup. Some systems implement incremental backups by capturing files that have changed or by identifying changed portions of the files and capturing only the changed file data. Other systems implement incremental backups by capturing the changed file data. Other systems implement incremental backups by capturing changed cluster data of an underlying file system or changed sector data (i.e., format actually used to store the data on a disk).
Currently, certain data protection applications (e.g., Backup Exec System Recovery (BESR)) are configured to generate two kinds of backups, a full or base backup and an incremental backup. For example, the Backup Exec System Recovery (BESR) creates a backup chain comprising an extremely large number of incremental backups based on a full backup. Each incremental backup comprises file data that has been changed since the previous incremental backup or the base backup. The previous incremental backup is considered to be the parent of a current incremental backup. Furthermore, access to each and every previous backup back to the base backup is required in order to assemble a complete backup.
The Backup Exec System Recovery (BESR) creates a synthetic incremental backup from one or more incremental backups (e.g., a collapse or image consolidation). The synthetic incremental backup corresponds to the file data represented by the collapsed one or more incremental backups. The synthetic incremental backup has a parent backup that may be a synthetic incremental or synthetic base backup. The synthetic incremental backup contains the file data that has been modified since a point-in-time captured by the parent backup. However, the collapse does not change the base or full backup. The base backup can be combined with a portion of the following incremental backups in the backup chain to create a synthetic base backup. The synthetic base backup captures the same set of data as would have been captured had a full backup been created at the point-in-time that the last incremental backup that was rolled up into the synthetic base backup was created. The rolling up of the base backup reads all the data in a targeted point-in-time including the data that has not changed, since the original base backup was taken. Such backup processing requires a commitment of substantial computing resources. Within a computer network comprising at least one server and a plurality of client computers, rolling up and collapsing can strain a client computer's resources as well as the network/processing resources and the like of the overall system. However, often at some point of time, the administrator no longer needs the incremental backups. For example, while the backup process may be performed every fifteen minutes, if the user wanted to restore data from two weeks ago, any backup from two weeks ago is most likely sufficient and access to any other backup at the fifteen minute resolution is not needed as long as the accessed backup is complete. A synthetic incremental backup can be created to represent (e.g., convert) ninety-six incremental backups (i.e., four incremental backups per hour for twenty-four hours equals ninety-six incremental backups).
Moreover, a long backup chain generated by the Backup Exec System Recovery (BESR) strains the resources of the overall system because more storage space and processing power is required to restore the data for each and every previous incremental backup back to the base backup in order to assemble the complete backup. Furthermore, restoring data from the long backup chain requires opening all of the backups in the chain, calculating the location of each sector and reading the necessary sectors to locate the correct incremental file data which is required by the user. Subsequently to execute the calculation, each backup in the chain requires some memory to hold a bitmap indicating which sectors are actually present in the chain. Even with significant memory optimization to the bitmap structure the Backup Exec System Recovery (BESR) is not able to support backup chains longer than a few thousand incremental backups, when working with large volumes (e.g., 100 GB or greater). Furthermore, if the backup chain is located across a slow network connection such as a virtual private network (VPN), the cost of opening and reading from many file data can be prohibitive.
Therefore, there is a need in the art for a method and apparatus for processing a backup chain using synthetic backups (e.g., synthetic incremental backups) to optimize the backup chain in an efficient and cost effective manner.