1. Field of the Invention
The present invention relates to a computer program product, system, and method for selecting files to backup in a block level backup.
2. Description of the Related Art
Current backup solutions allow backing-up volumes or disks at the block level—that is, instead of copying file after file into a destination location (also referred to as a repository), blocks of data (in either disk or volume level) are copied from the disk hosting the production volume into the repository. Such backup techniques that backup data at the block level do not consider the file arrangement of the blocks because they process blocks in the volume data based on the block locations instead of the arrangement of the blocks in files. A file level backup is done at the file level by copying file after file to the repository.
Block level backup applications implement a consistent, point in time, block level copy process of the production volumes to the repository location. A consistent backup is a backup that allows a restore of a volume/disk in a consistent state, meaning all transactions, both of file system and production application, are completed. Block level backup processes include point-in-time copy which replicates data in a manner that appears instantaneous and allows a host to continue accessing the source volume while actual data transfers to the copy volume are deferred to a later time. The point-in-time copy appears instantaneous because notification of “complete” is returned to the copy operation in response to generating the relationship data structures without copying the data from the source to the target volumes. Point-in-time copy techniques, also referred to as point-in-time copies, such as the IBM FlashCopy®(FlashCopy is a registered trademark of International Business Machines, Corp. or “IBM”) and snapshot, typically defer the transfer of a data block or track in the volume at the time the point-in-time copy relationship was established to the repository until a write operation is requested to that data block on the volume. Data transfers may also proceed as a background copy process with minimal impact on system performance. The point-in-time copy relationships that are immediately established in response to the point-in-time copy command include a bitmap or other data structure indicating the location of blocks in the volume at either the source volume or the copy volume.
Consistency of the backup is provided by a disk/volume-level filter kernel driver, which uses a COW (copy-on-write) technology in order to back up consistent image of volume as of the point-in-time when the backup was initiated. When an update to a block in the volume involved in a point-in-time or snapshot copy is received, then that copy of the block in the volume must be copied to the repository before the update is applied to the block in the volume. This means that the backup is “hot” and volume/disk can be in use during the backup process (which may take a long time).