Data objects (e.g., databases, file systems, files within a file system, etc.) are typically stored on memory devices such as hard disks. Data objects, however, are subject to data corruption as a result of hardware, software, or human error. Moreover, hard disks fail at the worst of times and take all data objects stored on them with them when they go. These problems have motivated the creation of data object recovery systems. The present invention will be described with reference to the recovery of a corrupted database, it being understood that the present invention should not be limited thereto.
In general, database recovery systems recover a database to a known, consistent data state that existed just prior to an event (e.g., an inadvertent deletion of a record in a database) that caused the corruption. FIG. 1 illustrates in block diagram form, relevant components of a data processing system 10 which employs an exemplary database recovery mechanism. FIG. 1 and its description should not be considered prior art to the claims below.
Data processing system 10 includes a computer system (e.g., a server) coupled to a data storage system 14. Computer system 12 includes a database manager 20 coupled to an application program 22, and a backup client 24. Each of the components 20-24 may take form in computer program instructions executing on one or more processors of computer system 12.
Data storage system 14 consists of a plurality of memory devices (e.g., disk arrays) that are logically aggregated to create a logical disk 26, which in turn stores a file system containing files accessible by database manager 20 and backup client 24. The file system on logical disk 26 stores a file (not shown) containing exemplary Database A. While the present invention will be described with reference to recovering Database A after corruption, the present invention may also be employed to recovery two or more corrupted databases, or one corrupted database amongst several uncorrupted databases. Although FIG. 1 shows database manager 20 and backup client 24 coupled directly to logical disk 26, the term coupled should not be limited thereto; database manager 20 and backup client 24 could be coupled indirectly to logical disk 26 via a file system manager and a volume manager (not shown).
As will be more fully described below, database manager 20 employs backup client 24 when implementing a database recovery process. Backup client 24 is in data communication with backup server 40, which in turn is coupled to backup memory system 42. For purposes of explanation only, backup memory system 42 includes a robotic tape handler (not shown) having access to several magnetic tapes (hereinafter “backup tapes”) upon which backup copies of Database A and transaction log extents (more fully described below) are stored. Further, backup memory system 42 includes one or more tape drives into which backup tapes are inserted by the robotic tape handler. Inserting a backup tape into a tape drive is often referred to herein as mounting the tape. A backup tape can be read only when it is mounted. In another embodiment, backup memory system 42 may include one or more disk arrays that can be aggregated to form a logical backup disk on which backup copies can be stored.
Database manager 20 generates database transactions in response to receiving instructions from application program 22. Database transactions, when implemented or committed, modify or add to the contents of Database A. Database manager 20 also creates database transaction log extents into which database transactions are logged in the order they are generated. The database transactions are logged before being committed. Logged transactions are committed to Database A either when resources are available or when scheduled.
In one embodiment, transaction log extents are formed as files within a directory TL of the file system on disk 26. Database transactions, when generated, are logged to the most recently created transaction log extent. Once a transaction log extent is filled with transactions, database manager 20 creates a new extent to store subsequently generated transactions. The first transaction log extent created by database manager 20 is designated TLE1. The subsequently generated transaction log extents are sequentially designated TLE2, TLE3, etc. The first transaction generated by database manager 20 is designated T1. Subsequently generated transactions are sequentially designated T2, T3, etc.
FIG. 2A illustrates a graphical representation of directory TL stored on the file system of disk 26. Directory TL includes the first two transaction log extents TLE1 and TLE2 created by database manager 20. FIG. 2B illustrates relevant contents of an exemplary transaction log extent TLE1. It should be understood the present invention should not be limited to using transaction log extents exemplified in FIG. 2A. Rather, database transactions can be logged to transaction log extents with a structure different than that shown in FIG. 2B, or in a data object other than a transactions log extent such as that shown in FIG. 2A.
TLE1 shown in FIG. 2B contains i entries. Each entry includes a logged database transaction (e.g., T3), a time stamp, and a commit flag. When database manager 20 generates a transaction, the transaction is logged to the next available entry within a transaction log extent such as TLE1 shown in FIG. 2B. When a transaction is first logged within an entry of a transaction log extent, the commit flag for the entry is initially set to logical zero. Once that transaction is committed to Database A, the commit flag is switched to logical one. Further, when the logged transaction is committed to Database A, the time at which the transaction is committed is entered as the time stamp within the transaction log extent entry for that transaction.
Backup client 24 operating in conjunction with backup server 40, performs backup operations at regularly scheduled times (e.g., nightly at 2:00 AM). As will be more fully described below, backup operations may be full or incremental. In a full backup operation, a backup copy of Database A is created and stored on one or more backup tapes within backup memory system 42. For purposes of explanation only, it will be presumed that a backup of Database A can be created and stored on a single backup tape. Additionally, a copy of each transaction log extent in directory TL is created and stored on a backup tape within backup memory system 42 during each full backup operation. During incremental backup operations, a backup copy of each transaction log extent contained in directory TL is created and stored on a backup tape within backup memory system 42. The contents of the file containing Database A is not copied to a backup tape during an incremental backup operation. After each incremental or full backup operation, database manager 20 removes each transaction log extent from directory TL, regardless of whether the transaction log extent is completely filled with logged transactions.
Incremental backup operations take less time to complete when compared to full backup operations since, in most cases, the quantity of data stored within transaction log extents on directory TL is small when compared to the quantity of data contained within Database A. As such, incremental backup operations are preferred over full backup operations since the backup window (i.e., the time needed to perform the backup operation) maybe small. The present invention will be described with full backup operations scheduled for once a week (e.g., Sunday at 2:00 AM), and incremental backup operations scheduled daily, except for the day when a full backup operation is performed.
If Database A is the subject of a data corrupting event, backup copies of Database A and transaction log extents are used by database manager 20 to restore Database A to a known, consistent data state that existed prior to the corrupting event. In the recovery process, Database A is initially restored to the state it occupied at a point in time prior to the corrupting event and when a full backup operation was performed. In other words, Database A stored on disk 26 is replaced with backup copy BCA(T) stored on a backup tape, where BCA(T) is the backup copy of Database A created at time T before time TE, the time of the corrupting event. Thereafter, transactions logged after creation of backup copy BCA(T) and before TE, are replayed (or recommitted). Before the database recovery process begins, database manager 20 will identify Tm as the first logged transaction to be replayed after Database A is restored to BCA(T). Logged transactions beginning with Tm are replayed in order until Database A is recovered to a known, consistent state that occurred just prior to time TE. Logged transactions can be replayed so long as the logged transactions are contained in a transaction log extent on directory TL. As noted above, however, database manager 20 removes transaction log extents from directory TL after each full or incremental backup operation. If database manager 20 discovers that a needed transaction log extent is not available on directory TL during the database recovery process, database manager 20 can request backup client 24 to restore the needed transaction log extent to directory TL as will be more fully described below.
FIG. 3 illustrates relevant, operational aspects of an exemplary database recovery mechanism implemented by database manager 20. The process shown in FIG. 3 is implemented after backup client 24, operating in conjunction with backup server 40, restores Database A to the data state it occupied at time T. At step 52, database manager 20 sets variable m to m−1, where m is identified by database manager 20 as the number of the first logged transaction to be replayed.
In step 54, database manager 20 increments m by 1. Thereafter, database manager 20 determines whether directory TL on disk 26 contains TLEn, the transaction log extent that, in turn, contains transaction Tm. If transaction log extent TLEn is stored on disk 26 under directory TL, the process proceeds to step 66 where Database A is updated or modified by replaying transaction Tm. However, if directory TL does not contain transaction log extent TLEn, the transaction log extent must be restored to directory TL, and the process proceeds to step 60 where database manager 20 generates a request for restoration of transaction log extent TLEn. Database manager 20 sends the request generated in step 60 to backup client 24, and the recovery process enters a pause mode defined by step 62. More particularly, the database recovery process pauses until database manager 20 receives confirmation from backup client 24 that transaction log extent TLEn has been restored to directory TL. Once restoration of TLEn is confirmed by backup client 24, database manager 20 increments n by 1 as shown in step 64, and Database A is updated by replaying transaction Tm in step 66. After replaying transaction Tm, the process proceeds to step 70 where database manager 20 determines whether Database A has been recovered to the known, consistent state that existed prior to data corruption. In one embodiment, the time stamps of the transaction log extent entries can be used to determine whether Database A has been recovered. The time stamp for each transaction replayed can be compared to time TC, the time at which Database A was in a known, consistent state just before data corruption. If the time stamp of the most recently replayed transaction equals time TC, Database A is considered recovered, and the process of FIG. 3 ends.
As noted in FIG. 3, database manager 20 in step 60 generates one or more requests for restoration of transaction log extent TLEn to directory TL. Each of these requests is received by backup client 24. For each request received by backup client 24, backup client 24 generates a corresponding request for the backup copy of transaction log extent TLEn. Backup client 24 sends the corresponding request to backup server 40. Software executing on backup server 40, in response to receiving the corresponding request from client 24, accesses a catalog or other stored information to identify the backup tape that stores the requested transaction log extent (i.e., the backup copy of TLEn). Once the appropriate tape is identified, backup server instructs the robotic tape handler to insert the identified tape into a tape drive of backup memory system 42. The catalog mentioned above, also identifies the location on the backup tape where the requested transaction log extent can be found. Once mounted, the backup tape is forwarded to the position where the requested transaction log extent can be read. Thereafter, a read/write head reads the requested transaction log extent as the tape is further forwarded. The backup copy of TELn is subsequently provided to backup client 24 via backup server 40. Backup client 24 in turn adds the backup copy of TLEn to directory TL on logical disk 26. Backup client 24 then informs database manager 20.
As noted, the database recovery process of FIG. 3 pauses each time database manager 20 generates a request for restoration of transaction log extent TLEn to directory TL. Each request may require a substantially amount of time to complete since (1) software executing on backup server must identify the appropriate backup tape that contains the requested transaction log extent in addition to the tape position where the beginning of the requested transaction log extent can be found, (2) the robotic tape handler must insert the identified tape into a tape drive of backup memory system 42 if the identified tape is not already mounted, (3) the mounted tape is forwarded to the position where the desired transaction log extent copy can be read, and (4) the tape is further forwarded while the read/write head of the tape drive reads the requested transaction log extent from the backup tape. Each transaction log extent restoration using the aforementioned process, adds to the overall time needed to complete the database recovery process of FIG. 3. This is true even in the alternative embodiment mentioned above in which backup memory 42 includes a plurality of disk arrays rather than tapes for storing backup copies since some time is needed, for example, to identify the location in the plurality of disk arrays where the backup copy of the requested transaction log extent can be found.