Field of the Invention
The present invention is directed to methods and computer-based systems for optimizing data recovery in a parallel database system.
Discussion of the Background
Many businesses today take advantage of relational database technology in order to store and access critical business data. Where a business stores a large quantity of disparate business data within a single database, the industry refers to such a data store as a data warehouse. Data warehousing has become very popular in that it allows a business to query data in many differing ways to learn new things with respect to customers, markets, trends, and even internal operations. Some of these large databases exist as serial entities where operations are performed between one computer (node), and one database instance (application and associated data in a single pool). The current trend, however, is for large corporate data warehouses to utilize parallel database management systems, (DBMS), whereby multiple nodes access relational data in parallel across multiple instances of the database. This allows for huge performance improvements, but also adds considerable operational complexity. One such area of complexity is in the area of data recovery in the event that a data loss or corruption occurs. The present invention addresses the field of data recovery in a parallel database environment.
All database systems are vulnerable to failures that can cause data loss or corruption. Commercial database systems (e.g., DB2, SQL Server, Oracle, Sybase, etc.) typically provide a mechanism by which data can be restored in the event of a failure. Most commercial database systems restore data by utilizing the most recent backup copy of the database in conjunction with the database's transaction log. In order to accomplish data restoration using the traditional roll-forward mechanism, the database management software typically utilizes backups of a full image database and transaction logs. A transaction log is a file that records all changes to table data since the last full image database backup. The transaction logs capture the state of the database before and after changes are made. Transaction logs contain less information than databases, and are thus backed up more frequently than databases. In a typical database system, many transaction log backups would occur between each successive full image database backup. Further, as discussed above, in a parallel database system, such logs are generated for each node interacting with the database and stored independently and then indexed to the database catalog.
Although parallel databases are becoming increasingly popular due to the amount of data that they can support and the speed at which queries can be processed, tools to manage recovery of table data in these databases are immature and insufficient. Currently, technical staffs supporting such environments are forced to recover the entire database table-space to a specific point-in-time via some external back-up media, such as tape, before recovery of specific lost or corrupted data can be accomplished. This recovery method is commonly referred to as “roll-forward recovery”.
Many prior art database reconstruction techniques follow the procedure detailed below. These prior art reconstruction techniques restore the database by using the most recent full image backup. The database is then “rolled forward” to a point in time, just prior to the time of the failure, by reapplying every transaction from each transaction log backup file saved since the last full image database backup. This procedure effectively restores the database to the state in which it existed just prior to the database failure.
These traditional backup and recovery techniques are designed to protect data from hardware and media failures, such as disk crashes. In a typical database system, however, a more likely cause of data corruption is a user or application error. An incorrectly formed ad-hoc query or an application software error may inappropriately delete or modify data in the database. Traditional restore and roll-forward mechanisms are sub-optimal and inefficient for recovering data from user or application errors.
In the first place, roll-forward reconstruction requires restoring a vast amount of data. The complete set of transactions that occurred since the last database backup, which must be reapplied during roll-forward recovery from a system failure, is typically much larger than a set of transactions that could be “rolled back,” thereby correcting a user or application error. As recognized by the present inventors, “Undoing” a small set of erroneous transactions is a far more efficient method of restoring a database than is reapplying all valid transactions that occurred since the last database backup prior to a failure.
In addition, the traditional roll-forward mechanism restores the database by rolling forward a restored backup image to a specific point in time. When recovering from user or application errors, this point in time is chosen to be the start time of an erroneous transaction that corrupted the database. By rolling the database forward to this particular point in time, the traditional mechanism effectively removes all transactions that may have occurred after the start of the erroneous transaction. By only rolling forward to the beginning point of the erroneous transaction, many transactions that occurred subsequent to the error, but that may have nonetheless have been valid transactions wholly independent of the specific database failure, are lost.
Some database systems (e.g., DB2, Microsoft SQL Server, Oracle) provide a recovery feature in which a “restore and roll-forward” may be performed on an individual table basis. Although to some extent this feature mitigates the inefficiency of the roll-forward method described above, it is still not sufficiently granular to permit the selection of erroneous transactions only. Also, since very large tables are common in data warehouses, the size of the tables may further limit the efficiency of the roll-forward procedure. Furthermore, this feature applies the selectivity criteria at the database object level, not the transaction level, resulting in recovery of only coarse-grained database objects, such as a table. None of the prior art restore and roll-forward methods allow recovery of fine-grained objects, such as an individual row within a table.
In view of the above-mentioned deficiencies, the present inventors have discovered there is a need for a method and apparatus for restoring a database that mitigates the inefficiencies of traditional roll-forward techniques, while simultaneously allowing recovery of fine-grained objects.