A data warehouse is a repository of historical data of an enterprise. For example, the data warehouse may be used by the enterprise to make forward-looking business decisions based on historical performance of the enterprise. An underlying implementation of a data warehouse may be, for example, a commercially available relational database, such as provided by Oracle Corporation. An example use of a data warehouse is a database maintained by a retailing chain, for which records of retail transactions are periodically (e.g., nightly) uploaded from computers of each store of the chain.
It is desirable to backup a data warehouse such that, in the event of corruption, failure or some other event with respect to the data warehouse, the data warehouse can be restored from the backup. Data warehouses can typically contain very large amounts of data, on the order of terabytes or more. Backing up such large amounts of data can be very time consuming. Furthermore, restoring such large amounts of data can also be very time consuming.
For example, a database may be put into “hot backup” mode and the data files copied to backup storage, such as tapes or backup disc storage. An example of a backup utility is one, provided by Oracle Corporation, known as Oracle Recovery Manager (RMAN). To support RMAN or hot backups, the database is typically configured to generate archive logs during the process of data loading. As such, every data modification (e.g., insert, update and delete) causes transactions to be logged, which are then archived in order to support a later database recovery.
The generation of archive logs can use substantial processing and input/output resources, which can impact the process of data loading (e.g., can slow the process of data loading or can necessitate the use of additional resources for the data loading). In addition, a large amount of data may be generated for the archive logs themselves, which can use a large amount of disc space.