An incremental backup system can generally be described as a system in which initially, a full backup is performed, and subsequently only changes relative to the full backup are stored. The full backup can be periodically updated, generally by adding the incremental change files to the previous full backup. Some systems may even keep track of two or three full backups e.g., on a rolling or circular buffering basis.
For example, as shown in FIG. 1A, at t1, a full backup F1 of a system, disk, volume, or other collection of objects is performed. Restoring the system to the data stored in F1 reproduces the system conditions at t1. At t2, a full backup is unnecessary due to the likelihood that most of the data has not changed. Accordingly, an incremental or partial backup F1_I1 is performed, whereby typically only ‘changed’ portions of the backup target are recorded, so that the addition of F1 and F1_I1 produces the information for a system backup to time t2. Similarly, at t3, another incremental backup F1_I2 is performed that represents the change from t2 to t3. Accordingly, the addition of F1, F1_I1 and F1_I2 produces the information for a system backup to t3, and the process of producing additional incremental backups may continue indefinitely.
As shown in FIG. 1B, another full backup F2 can be performed, e.g., at t4, and subsequently incremental backups F2_I1, F2_I2, etc. for corresponding full backup F2 may be produced. The incremental backups F2_I1, F2_I2, and so on represent change to the backup target since full backup F2. Thus, as compared to FIG. 1A, the computer system performing full backups F1 and F2 can restore to, e.g., t5, more easily. In FIG. 1A, the addition of F1, F1_I1, F1_I2, F1_I3 and F1_I4 produces the information for a system backup to t5, whereas for FIG. 1B, only F2 and F2_I1 need be processed.
Furthermore, as shown in FIG. 1C, multiple full backup/incremental processes can be run simultaneously. After full backups F1 and F2, incremental backup files may be produced for both F1 and F2 separately, and F1 and F2 may have different files, volumes, etc. as the target for backup. Still further simplifying potential backup calculations and operations is the cumulative incremental calculation to a particular time. FIG. 1D illustrates a cumulative to time N (CTN) calculation in the context of FIG. 1A. For example, in FIG. 1A, to produce a snapshot of t8, an addition of F1_I1, F1_I2, F1_I3, F1_I4, F1_I5, F1_I6 and F1_I7 is performed for a system backup. Since eventually the number of incremental calculations and subsequent additions to produce a snapshot at time N may become inefficient or unmanageable, a cumulative backup may be computed to a time N.
In FIG. 1D, a cumulative backup for an entire system to time t6 (denoted F1_CT6) is calculated representing or embodying the change of incrementals F1_I1, F1_I2, F1_I3, F1_I4 from F1 and any change occurring from t5 to t6 as well. Thus, to compute a backup of the entire system to time t6, F1 and F1_CT6 are processed or added, eliminating the need to process the individual incrementals such as F1_I1, F1_I2, F1_I3, F1_I4 and F1_I5. To produce such a cumulative backup to a time t6, e.g., prior art techniques have examined the change of the target backup object from the time of a recent full backup to the target time for the cumulative backup. This is a comparison of the state of the system at the time of the full backup to the state of the system at the time of the cumulative backup. Thus, to produce a cumulative backup file, prior art techniques in essence produce an additional or separate incremental file for a time interval, measured from the full backup, that is greater than the time interval for a typical incremental file.
However, with respect to these types of incremental backup systems, a problem exists whereby particular kinds of restore operations may not require a processing a full backup including each and every incremental backup file. In essence, processing a full backup including each and every incremental backup file is computer resource intensive, with corresponding burdensome time consumption i.e., it takes a long time to perform such a backup. For example, if only a word processing application crashes, files incidental to and dependent upon that application should be the subject of restoring. Furthermore, certain crashes may be recurring, and thus information about files incidental to these crashes is valuable and may be the source of efficient restore operations for these types of crashes. Additionally, there is redundancy of information from incremental file to incremental file that is not exploited when each incremental file is utilized in a restore process.
In response to these difficulties, a back end system with co-location keys was developed. A typical back end system administers a collection of tapes that sequentially store incremental backup files. By picking out a set of files unique to a particular restore operation from the collection of tapes, and co-locating them on a single tape, the files advantageously can be co-located on a single tape avoiding time and resource intensive searching, and allowing faster and more efficient restore operations. However, today's tape co-location techniques are implemented in conjunction with an on-line computer system in order to perform the operations incident to co-location tape generation, and significant advances in storage size and access have occurred since the development of current generations of backup systems. On-line resources are premium resources compared to off-line resources and storage solutions have proliferated since the days of tape backup storage. Consequently, computing resources are wasted to accommodate prior art techniques.
Thus, it would be desirable to provide a technique that provides off-line collection and management of backup file subsets for different types of restore operations. It would be further advantageous to input portion(s) of a system to which efficient backup techniques would be suited. It would be further advantageous to monitor and analyze aspects of system restore operations, so that inefficiencies resulting from existing system backup or restore operations may be detected and made more efficient through the use of cumulative backup techniques tailored to the inefficiency. It would also be beneficial to utilize information contained in incremental files to promote efficiency, such as through the retrospective analysis of such information to produce cumulative backup file(s).