Modem databases can be key tools which assist users in keeping track of critical business transactions. In many cases, database loss can be disastrous to an enterprise. A database loss may precipitate from multiple causes; hardware failure, software failure, facility failure, or natural disasters affecting any of the other supporting structures for the database. Thus, database recovery after a loss is an important aspect of proper database management. In one aspect of database management, full database backups are typically made on periodic intervals, such as weekly, to provide a backbone for recovery. Having a functional full database backup is a requirement for applying the partial, differential daily database backups needed to recover a database to the day just preceding the database loss.
Typically, database administrators make a weekly full database backup and store the backup to some type of media. Often, that media is magnetic tape. This weekly full database backup is the backbone of a recovery mechanism for the database. However, a question may remain as to whether the database backup, as stored on the media, is corrupted or not. A corrupted database backup may not be fully recoverable. One prior art method to check the viability of a database backup is to reconstitute the database backup into a second working copy of the original database and perform tests on that database. This method is expensive in terms of resources because modem databases may be of the terabyte size. Often an enterprise may not have an unused second terabyte of disk or random access storage media casually available upon which to perform a database backup integrity check. However, such an integrity check is vital to the reliability of a database backup strategy.
These vital tests on a database may include tests on its internal linked data structures. Linked data structures, such as B-trees, are logical arrangements of data that facilitate efficient and organized data storage, data manipulation, and data retrieval. The basic component of a linked data structure is known as an element or node. Individual nodes in a linked data structure are linked together by special fields called pointers that identify or “point to” neighboring nodes in a linked structure of nodes. A pointer is also sometimes referred to as reference.
Each node of a linked data structure must be accurately represented because logically neighboring nodes of a linked data structure are not necessarily stored in adjacent physical locations on a storage device. Absent a guarantee of physical proximity from one node to the next on a storage device, it is difficult to know which node is actually the next logical node in a linked data structure if an invalid pointer exists. Thus, a pointer that does not correctly point to a next logical node within the linked data structure can render the entire linked data structure unreliable and unusable.
FIG. 1 depicts a basic B-tree structure in a database showing multiple links and requiring multiple pointers. Database nodes A, B, C, D and E (102, 104, 106, 108, 110 respectively) may have hierarchical relationships to one another supported via pointers. For example, the root node, A 102, has two child nodes B 104 and C 106 with pointers 152 and 154 respectively. Nodes B and C are siblings and may have forward and back pointers 156 and 158. Leaf nodes D 108 and E 110 are child nodes of node B 104 and may also have forward and back pointers 160 and 162 between B and D and pointers 164 and 166 between nodes B and E. Nodes B and E are siblings to each other and may have forward and back pointers 168 and 170. If pointers 164 and 168 between nodes B and E and D and E respectively were lost or incorrect, the link to node E from either node B or D would be lost. This would result in a loss of a corresponding data association in a database.
In general, if any pointer in a linked data structure becomes corrupted or otherwise invalid so that a pointer does not correctly point to what is intended as the next node or child node, then the integrity of the entire data structure is compromised. A compromised data structure is neither reliable nor usable. Although it is possible to identify an invalid pointer in a small linked data structure having only a few nodes, the task becomes very complex and costly in terms of time and/or computing resources for large linked data structures having millions of nodes, and more difficult still if more than one invalid pointer exists among the nodes.
It is therefore vital to verify the consistency of such pointers in a database. Additionally, it is vital to perform a consistency check on the database backup to verify the integrity of the database backup and to guarantee that a full recovery is possible. However, there may be practical problems in running a consistency check on a backup of a database.
FIG. 2 depicts a typical database 200 containing data files 205, 206, 208 and a log file 210. The data files are further divided into storage blocks, also referred to as pages. These storage blocks hold the records in a database and hold the nodes associated with linked data structures such as B-trees. A typical database backup 250 of the database 200 may not necessarily contain backups of all the literal data files 205, 206, 208 that make up the database 200. Instead, a database backup merely needs to contain the blocks currently in use from the data files in some, possibly different, order and format. For example, the database backup 250 contains backup file 220, which is a backup of data files 205-208 in the original database 200. Data file 205 may contain many thousands of pages but only a few pages (illustrated in FIG. 2 using the descriptive numerical notation “data file: page”) may be used by the database 200. In the example of FIG. 2, backup file 220 in database backup 250 will only contain two pages from data file 205; namely 205:3 and 205:2. A similar situation may also exist for data files 206 and 208. Thus, a database backup may not be a convenient or compatible environment or form for running a consistency check. A log file 210A may generally be associated with a database backup 250 of the database 200 to provide details of any transactions that changed the database 200 at the time when the database backup was being placed onto storage media.
Using prior art principles, to check a database backup, the database must be reconstituted from the database backup, the transaction log must be applied to the reconstituted database backup to recover a database to the state that existed when the database backup operation was complete and then consistency checks must be run. The disk or random access storage space required to perform this type of verification of a database backup is at least as large as the original database and requires a great amount of time because prior art consistency checks are time consuming as they require multiple passes through the database in the verification of such items as linking pointers and finding pages containing storage allocation maps and database schema metadata. This task may become even more ominous when a sequential media such as tape is used for the database backup.
Thus, there exists a need for a system or method which will allow a storage space and time efficient method of verifying the integrity of a full database backup. The present invention addresses the aforementioned needs and solves them with additional advantages as expressed herein.