1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to restoring data from backup storage.
2. Description of the Related Art
In disaster recovery backups, data is physically transferred from the primary storage media to the backup media. The backup may be to either disk or tape, though tape has traditionally dominated this market. With the continuing reduction in the cost of disk storage more sites are switching to disks as the backup media. In addition to the lower cost, disk storage tends to occupy less space and is faster than tape. While disk tends to be faster than tape, it should be noted that disk backups and restores typically result in a considerable amount of application down time (typically hours).
In high-end applications, primary storage disks are typically high performance (e.g. EMC, Hitachi, or IBM arrays). Purchasing and maintaining equivalent sets of disk arrays to perform mirroring can be very expensive. Therefore, many sites use inexpensive, mediocre-performance solutions for backup storage (e.g. arrays of IDE disks). Typically, users of such high-end applications do not use such backup storage as “mirrors” that can be switched to and run off backup storage due to the poor performance of the backup storage. For this and other reasons, mirroring and switching to a backup image to run in a production system may not be a viable solution for many enterprises.
In addition, disaster recovery backups are typically not just copies of data like mirrors. A backup application may include backup-specific information or formatting with the backed-up data. A backup application may write to disk like it is writing to tape, e.g. in TAR format. Therefore, the backed-up data in backup storage may not be in a format that can be switched to directly to serve as the primary data in a production system.
In general, data moved to or from storage devices is provided using either block-level or file-level access. File level access requires some knowledge of the underlying file system and/or volume management system used to organize data on the storage devices. This type of information is typically available only at the host level, and thus I/O operations utilizing file-level access must be performed or at least managed by software executing on a host computer. Block-level access uses physical storage device addresses to access data and thus need not be “assisted” by some entity having file system and/or volume knowledge.
A data restore application may restore data from backup storage to primary storage using the addresses of the source and destination devices and blocks. Such address information is typically in the form of an extent list having one or more extents. An extent is typically a contiguous set of storage blocks allocated for a file portion, a file, or multiple files. Extents are typically represented by a device address indication, a starting block address on that device, and a length (number of contiguous blocks). However, extents can be defined in a variety of different ways, e.g., a starting address and an ending address, no device information explicitly included, etc. Thus, an extent is generally any information used to locate a desired portion of a storage resource.
Typically, during restores, an application will have to wait for a file to be fully restored before accessing the file. Since a restore operation may restore files in any order, an application may have to wait a considerable amount of time for a particular file to be fully restored. Large databases may include hundreds of gigabytes or even terabytes of data; restores of these databases may take hours or even days before the data reaches a stable state. In many cases, applications may have to wait until all of the data is restored before they can access any of the data.
Therefore, it is desirable to provide a restore mechanism that has reduced impact on production applications. It is also desirable to restore data needed from disk-based disaster recovery backups in a near instantaneous manner from the production application's perspective. It is also desirable to allow application to be active and accessing data being restored while the restore is in progress transparent to the applications.