Data objects (e.g., databases, file directories, files within directories, email exchange mailboxes, etc.) are typically stored on memory devices such as hard disks. But hard disks fail at the worst times and take all the data objects stored on them with them when they go. This problem motivated the creation of backup systems. In general, backup systems copy data objects to separate memory media (e.g. magnetic tapes) at regularly scheduled times. If lost due to hardware failure or other reasons (software errors that corrupt the contents of data objects, or user errors such as inadvertent deletion of data objects) data objects can be restored to a known, consistent state using backed up copies stored on separate memory media.
FIG. 1 illustrates in block diagram form, relevant components of a data processing system 10 which employs an exemplary backup system. For purposes of description only, the present invention will be described with reference to backing up files of a file system, it being understood that the present invention should not be limited thereto. Rather, the present invention could be used for backing up other types of data objects.
FIG. 1 shows a computer system 12 (e.g., an application server) coupled to a disk array 14 containing several hard disks. A volume manager or other software executing on computer system 12 or disk array 14, logically aggregates the hard disks to create a logical disk that stores a data volume V. Computer system 10 implements a file system that manages files and their respective metadata, which are stored on volume V. File metadata is data about data contained in a respective file. File metadata usually includes a filename. Metadata may also include information about its corresponding file's position within the hierarchy of a file system. In addition, file metadata may include file attributes or properties, time stamps, security information, lists of block addresses where each file's data is stored, etc. A file's attributes is typically defined by a multibit field, each bit of which is set to logical one or zero. One bit (the archive bit) indicates whether the data within the corresponding file has changed since some previous point-in-time (e.g., since the last time the file has been copied to backup memory via a backup operation). Another attributes bit (the read-only bit) indicates whether the corresponding file is a read-only file. Still another attributes bit (the directory bit) indicates whether the corresponding file is a directory. The metadata for a file may also include time stamps. For example, the metadata may include a modification time stamp indicating when the file was last modified via a write operation, a creation time stamp that indicates the date the file was created, an access time stamp indicating when the file was last accessed, etc.
Returning to FIG. 1, data processing system 10 includes a backup server 18 coupled to backup data storage subsystem (hereinafter backup memory system) 22 via storage interconnect 24. Backup memory system 22 may include a robotic tape handler (not shown) having access to several magnetic tapes (hereinafter backup tapes) upon which backup sets and/or backup catalogs (more fully described below) can be stored. Further, backup memory system 22 includes first and second tape drives 28a and 28b into which backup tapes are inserted as needed. Inserting a backup tape into a tape drive is often referred to herein as mounting the tape. Lastly, computer system 12 and backup server 18 are coupled to each other via local area network (LAN) 26. Although not shown, LAN 26 may be shared by several other computer systems.
As noted, files and their respective metadata are stored on volume V. Backup software executing on backup server 18 and/or computer system 12 operates to create backup sets or copies of files and metadata stored on volume V at regularly scheduled times. As will be more fully described below, backup operations may be full (including synthetic full) or incremental. A full backup operation produces a full backup set or a copy of all files (and associated metadata) stored on volume V. An incremental backup operation produces an incremental backup set or a copy of only those files (and associated metadata) that have changed since some previous event (e.g., a prior full or incremental backup operation). During backup operations, LAN 26 transmits copies of files and their metadata from disk array 14 to backup memory system 22 via backup server 18. For the purposes of explanation only, backup sets are data objects that contain copies of files from volume V.
For most applications, incremental backup is preferable at backup since, in most cases, the number of files on data volume V that change between backups is very small compared to the total number of files, and since the backup window (i.e., the time needed to perform the backup operation) may be small. If backup operations are performed daily or even more frequently, it is not uncommon for less than 1% of files to change between backups. An incremental backup operation in this case copies 1% of the data that a full backup would copy and uses 1% of the input/output (I/O) resources. Incremental backup appears to be the preferred mode of guarding data against hardware or software failure. And so it is, until a full restore of all files on data volume V is needed. In a full restore all files from newest full backup set are copied from tape back to disk array 14. Then files from the newer incremental backup sets are copied from respective tapes back to disk array 14. That can require a lot of backup tape handling performed by, for example, the robotic tape handler of backup memory system 22. For these reasons, restoration from a single backup set is generally simpler and more reliable than restore from combinations of full and incremental backup sets. For recovering from individual user errors, the situation is just the opposite. Users tend to work with one small set of files on volume V for a period of days or weeks and then work with a different set. Accordingly, there is a high probability that a file corrupted or lost as a result of user error will have been used recently and therefore will be copied in one of the incremental backup operations. Since incremental backup sets contain a smaller fraction of data than a full backup set, they can usually be searched much faster if a restore of a particular file is needed. The ideal from the individual user's standpoint is therefore many small incremental backup operations. Some backup systems offer a compromise: the ability to consolidate a baseline full backup set and several incremental backup sets into a new, more up to date full backup set, which becomes the baseline for further incremental backup operations. While costly in terms of the time needed to create them, these synthetic full backup sets can simplify restoration processes.
FIGS. 2-4 illustrate relevant aspects of creating full, incremental and synthetic full backup sets of files stored on volume V. FIG. 2 shows a logical disk 30 and several backup tapes 32(1)-32(m) and 34. The logical disk 30 stores the contents of data volume V while the backup tapes are used to store backup sets. Backup server 18 creates a full backup set 1 on backup tape 32(1). The full backup set 1 includes a copy of all files (and associated meta data) stored on volume V at the time of the full backup operation. Backup server 18 also sequentially creates incremental backup sets 2-m on backup tapes 32(2)-32(m), respectively, after creating full backup set 1. It is noted that several backup sets could be stored on a single backup tape. However, it will be assumed, except where otherwise noted, that each backup tape stores a single backup set. Eventually, backup server 18 creates a synthetic full backup on backup tape 34 from files of some or all of backup sets 1-m. All backup tapes 32(1)-32(m) and 34 are accessible by the robotic tape handler of backup memory 22.
Computer system 12, in one embodiment, creates a backup set catalog (catalog) during each full or incremental backup operation. The catalogs identify files copied during backup operations. Moreover, the catalogs list the order in which files are copied to and stored on backup tapes. Once created, the catalogs are provided for subsequent use by server 18. FIG. 3 shows backup catalogs 36(1)-36(m) corresponding to backup sets 1-m, respectively. Each of the catalogs 36(1)-36(m) identifies the files of backup sets 1-m, respectively, in addition to the order in which files of backup sets 1-m, respectively, were copied to backup tapes 32(1)-32(m), respectively. Although not shown, catalogs 36(1)-36(m) may identify locations in tapes 32(1)-32(m), respectively, where copied files can be accessed. For example, each catalog entry may contain a file offset and file size in addition to the file identification (file ID). All catalogs may be stored in cache memory (not shown) of backup server 18. The need for backup catalogs will become more apparent while describing the process for creating synthetic full backup sets below.
Backup server 18 creates incremental backup sets after creating a full backup set as noted above. To illustrate, backup server 18 creates incremental backup set 2 on tape 32(2) after backup server creates full backup set 1 on backup tape 32(1), where backup set 2 includes a copy of all files within volume V that were modified (e.g., written) since the creation of full backup set 1. There are many ways to identify files that have been modified since the creation of the full backup set 1. For example, backup server 18 or an agent executing on computer system 12, may use modification time stamps to determine which files on volume V have been modified and should be copied to tape 32(2) during the incremental backup operation. Each time the contents of a file or its metadata are modified, the file system may update the file's modification time stamp to equal the time when the modification successfully completes. Server 18 or an agent executing on computer system 12 may traverse the modification time stamps for the files on volume V, and when a time stamp is found with a date that is later in time than the time when the last backup operation occurred, the associated file (and its metadata) is deemed modified and subsequently copied to backup tape 32(2). This process is repeated until the modification time stamp for all files on volume V have been examined. In an alternative method, backup server 18 or an agent executing on computer system 12 may use the archive bit of the attributes field to determine which files on volume V should be copied to backup tape 32(2) during the incremental backup operation. When a file or its metadata is modified, the file's archive-bit may be set to logical one by the file system. During an incremental backup operation, the archive bits are traversed, and when an archive bit is found that is set to logical 1, the associated file (and its metadata) is deemed modified and subsequently copied to backup tape 32(2). After copying, the archive-bit is set back to logical zero. This process is repeated until the archive bits for all files have been traversed.
Backup server 18 can create the synthetic full backup set of files on volume V using one or more of the backup sets 1-m and their associated catalogs. In general, the synthetic full backup contains the most recent version of each file currently stored on volume V. Tape 34 shown in FIG. 2 is configured to store the synthetic full backup set created by backup server 18. The contents of the catalogs 36(1)-36(m) can be used to determine which files of the backup sets 1-m are to be combined to create the synthetic full backup set. It is noted that during the creation of the full or incremental backup sets, one or more files of data volume V may have been deleted or added. However, for sake of description simplicity, it will be presumed that no files are added to or deleted from volume V during the backup processes described above.
FIG. 4 illustrates relevant operational aspects of a process for creating a synthetic full backup set using catalogs 36(1)-36(m) and backup sets 1-m. Before the process FIG. 4 begins, a catalog of files currently stored on volume V is created. FIG. 3 shows exemplary catalog 40 which lists all files currently stored on volume V. Catalog 40 is used to identify which files are needed from backup sets 1-m to create the synthetic full backup set on tape 34. After catalog 40 is created, a variable x set to 0 and incremented by one as shown in steps 50 and 52 in FIG. 4. Backup server 18 then sets a variable y to m+1 and decrements y by 1 as shown in steps 54 and 56. Backup server 18 then begins a search for the most recent version of file x identified by catalog 40. More particularly, backup server 18 accesses catalog 36(y) to determine whether the most recent version of file x is contained within incremental backup set y as shown in step 60. For the purposes of explanation, it will be presumed that catalogs 36(1)-36(m) are accessible in cache memory of server 18.
When the process shown in FIG. 4 is first started, backup server 18 starts with catalog 36(y=m) in step 60 because it corresponds to the most recently created incremental backup set m. If catalog 36(y) indicates that file x is contained within incremental backup set y, then backup server 18 copies file x from tape 32(y) to tape 34 as shown in step 66, if tape 32(y) is mounted and the data object that contains the backup set y is open. As noted above, backup memory system 22 has only two tape drives, tape drive 28a and tape drive 28b. During the process shown in FIG. 4, tape 34 is mounted on tape drive 28b and configured to receive and store files of the synthetic full backup set. If backup server 18 determines that file x is in catalog 36(y) as shown in step 60, but tape 32(y) is not mounted on drive 28a, then the process proceeds to step 64 where the robotic tape handler removes the backup tape from drive 28a and mounts tape 32(y). It is noted that backup server 18 may have to close any opened data object that contains a backup set of the backup tape mounted on tape drive 28a before the robotic handler removes the backup tape. After tape 32(y) is mounted on drive 28a, backup server 18 opens the data object that contains backup set y so that file x can be copied from tape 32(y) to tape 34 as shown in step 66. Once file x is copied to tape 34, backup server 18 may update entry x of catalog 40 to include location information (i.e., file offset and file size) where file x can be found on tape 34. For purposes of explanation, it will be presumed that opening a backup set is meant to mean opening a data object (e.g., a file) that contains the backup set.
If backup server 18 determines that file x is not identified in catalog 36(y) in step 60, the process proceeds to step 70 where backup server 18 determines whether incremental backup set y is the first incremental backup set created after full backup set 1. Backup server 18 makes this determination by comparing the current state of variable y to 2. If y equals 2 in step 70, then the newest version of file x is contained in full backup set 1 and is copied from tape 32(1) to tape 34 as shown in step 76, if tape 32(1) is mounted on drive 28a of the backup memory system 22. If tape 32(1) is not mounted, the robotic tape handler swaps the existing tape mounted on drive 28a with backup tape 32(1), and backup server 18 opens backup set 1. Backup server 18 may have to close the backup set stored on the tape in drive 28a before the robotic tape handler swaps tapes. In step 76 backup server 18 copies file x from tape 32(1) to tape 34 as shown in step 76. In step 68, backup server 18 optionally updates entry x of catalog 40 to include location information where file x, copied in step 76, can be found.
If backup server 18 determines in step 70 that y does not equal 2, then incremental backup set y is not the first incremental created after full backup set 1, and the process proceeds to step 56 where y is decremented and step 60 is repeated. Eventually, the most recent version of file x is found and copied to tape 34 in step 66 or 76, and catalog 40 is optionally updated accordingly. Thereafter, steps 54-64 are repeated for the next file of catalog 40. After all of the most recent versions of files 1-n have been copied to tape 34, the process shown in FIG. 4 ends.
When creating the synthetic full backup set, files are copied to tape 34 in the order defined by catalog 40, beginning with file 1 and ending with file n. One of ordinary skill can see from FIG. 4 that at a substantial amount of processing and backup tape handling is needed for backup server 18 to create the synthetic full backup set on tape 34. Indeed, one of ordinary skill can appreciate that any one of the full or incremental backup sets 1-m may be opened and closed many times during the process to create the synthetic full backup set, since the order in which files are copied to tape 34 is defined by catalog 40. Moreover, it can be seen that any one of the backup tapes 32(1)-32(m) may be repeatedly mounted and removed from tape drive 28a during the process shown in FIG. 4.